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Way of the dragon 


To boost its research quality and innovation, China must strengthen its scientific foundations and 


let researchers — not policymakers — set the agenda for innovation and discovery. 


nation has more researchers than any other country and it is 

rapidly catching up with the United States in the number of 
scientific papers published. But there are lingering questions — both 
within China and outside — about the quality and inventiveness of 
science coming out of the country. 

Concerns over science in China go to the very top. Xi Jinping, 
China's leader, offered a particularly harsh assessment late last month 
at a meeting of the country’s leading scientific academies. He went so 
far as to say “the country’s S&T foundation remains weak”. 

Xi has a point. Many of the inventions that gave rise to some of 
the most important scientific work in China — CRISPR-Cas9 gene- 
editing tools among them — are the products of colleagues overseas. 
Xi put it like this: “The situation, in which our country is under 
others’ control in core technologies of key fields, has not changed 
fundamentally.” 

From that angle, China still looks like a nation of large-scale 
implementers. Take an idea, especially one that requires scale, and 
China is there to jump on it. That is not a bad place to be — the 
genome-sequencing giant BGI and a new generation of sequencing 
rivals are a clear sign of just how productive scale can be. But that is 
application, not the kind of breakthrough that Xi seeks. 

That’s why the country’s first scientific Nobel prize, awarded last 
October to Tu Youyou for her role in developing the antimalarial 
drug artemisinin, provoked pride but also soul-searching. It was a 
discovery from a bygone era, not a product of the current research 
structure — and many wonder whether today’s system will yield any 
big discoveries. 

Ina special issue this week, Nature looks at China's potential and 
the obstacles it faces (see www.nature.com/chinafocus). Xi told the 
meeting that “scientists should be allowed to freely explore and test the 
bold hypotheses they put forward”. He encouraged the development 
ofa system in which science policy is created by scientists, rather than 
at the whim of officials, and alluded to experts who “should no longer 
have to follow their superiors’ orders”. 

If anyone can break the bureaucrats’ hold on scientific policy- 
making, it is Xi, who has emerged as China's strongest leader in 
decades. He has already taken on, and taken down, numerous politi- 
cal foes. And yet, as China implements its latest five-year plan and 
overhauls its major funding mechanisms, there is reason to wonder 
how much things will change. 

Xi couches much of his support for science as the quest for 
translatable results. Scientists should, he says, solve urgent economic 
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r There is increasing excitement over China's scientific rise. The 


is a high priority. 
These are fine 


objectives, but they suggest continued top-down policymaking. The 
balance between encouraging basic research and demanding techno- 
logical output must be guarded closely, or scientists will be pressured 
to do only translatable research and China will tread on the freedom 

of scientific pursuit that Xi holds is essential. 
Although Xi seems to understand the scientific thirst for 
independence and freedom, the ongoing question is whether China 
will offer that. This includes freedom to use 


“Truly tools such as Google Scholar. 
pioneering Xi faces some of the greatest battles of 
science is to be China’s recent past: military tussles in the 


South China Sea have raised the political 
stakes abroad, economists talk of a danger- 
ous slowdown, and environmental prob- 
lems are frustrating citizens at home and threatening the country’s 
international stature. Xi vows to raise spending on science, but it 
would be a mistake to think that increasing spending on research 
and development will solve all the issues of the homeland, make 
food and drugs safe, resolve the problem of an ageing population 
and get rid of the disparities between urban and rural China. 

At the meeting, Xi said: “Currently, the state needs the strategic 
support of science and technology more urgently than any other time 
in the past.” But truly pioneering science is to be cultivated, not com- 
mandeered. How well that distinction is maintained will determine 
much of what lies ahead. m 


cultivated, not 
commandeered.” 


Data sharing 


Pooling clinical details helps doctors to diagnose 
rare diseases — but more sharing is needed. 


hen doctors in Ottawa saw a child with an unusual devel- 

W ermal disorder last year, they were stumped. Their 

patient had an abnormally small head and face and had 

been slow to develop. They sequenced the child’s genome hoping 

to find a genetic explanation, but came up with too many possible 

candidate genes to pinpoint a likely culprit. This still happens a lot in 

medicine: people with rare problems go undiagnosed. And that’s one 

reason behind a big push in science in recent years — the pooling and 
sharing of clinically relevant information. 

In the Ottawa case, the doctors got lucky. They were able to search 

a database that contained information about other patients with 

undiagnosed diseases, and when they did so they found a second 

person with similar symptoms — and an identical mutation in one 
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gene, EFTUD2. The finding allowed the Ottawa doctors to diagnose 
their patient with a disease called mandibulofacial dysostosis with 
microcephaly, and to begin to understand why mutations in EFTUD2 
cause the disease’s symptoms. 

That’s the upside of the new era of data sharing. But there is a pos- 
sible downside too: invasion of privacy. Massive genetic studies in 
countries such as the United States, Qatar, Saudi Arabia and Brazil 
are collecting genetic data on millions of people, so there is a chance 
that a person’s identity could be dragged from those data — especially 
if they are linked to clinical information, such as medical history. The 
risk is that someone who volunteers their DNA could see their medi- 
cal problems opened to public scrutiny. 

This is a legitimate concern for many researchers, and is one 
reason why data sharing is easier said than done. Others include 
the lingering sense of ownership, and the career benefits offered 
to those who have privileged access. Those concerns relate to the 
standard model of data sharing, in which different groups of sci- 
entists deposit their results into centralized databases. This model 
has had some success, but researchers have already encountered 
problems, such as how to grant and control access to the pooled 
information. 

Pooling it in the first place becomes more difficult as the data sets 
get larger and the underlying techniques more varied. Imagine the 
difficulty of finding a specific book by gathering all the contents of 
a dozen different national libraries and then devising a way to inte- 
grate the numerous ways in which they are filed, tracked, recorded 
and made available. It would be much easier to ask each library 
whether it holds that book. What if data sharing in science could 
go the same way? 

The diagnosis of the Ottawa child shows that it can. The doctors 
tapped into a system that is part of the Matchmaker Exchange, which 
allows researchers to query multiple databases of information on 
patients with undiagnosed rare diseases. A doctor can feed the system 
information about a patient’s symptoms and genetic make-up, and 
then ask it whether other people have them too. (Normally, it’s hard 


for doctors to find other patients with similar rare diseases; often they 
learn about such cases by word of mouth.) 

The Matchmaker Exchange exemplifies a subtle shift in how 
researchers think about data sharing — and one that more 
scientists should engage with. It was created by the Global Alliance 
for Genomics and Health, a 3-year-old organization with more 
than 700 members from 70 countries that aims to help researchers, 

doctors and patients to make scientific 


“As technology progress by sharing data (see Global Alli- 
to permit ance for Genomics and Health Science 352, 
targeted data 1278-1280; 2016). 

access improves, The alliance is creating technological 
so will smart tools that allow researchers to find out where 
sharing.” data that are relevant to their patients are 


held around the world. It aims to make data 
not just shareable but discoverable, too. Doing this allows those who 
produce the data to keep more control of the information. It also 
streamlines searches. For example, researchers looking for a diag- 
nosis want to know the symptoms that other doctors have seen in 
people with particular genetic traits. Thus they just want to know 
who might have seen these mutations and what symptoms might have 
been observed in patients who have them; they don’t want to comb 
through all the existing databases of genetic information themselves. 
Of course, there are still many instances in which accumulating 
and sharing large amounts of data — on particular genetic traits, 
for example — is essential and valuable. The gene-testing company 
Myriad Genetics is locked in a tussle with doctors and patients who 
want it to open up its massive database of information on variations 
in the BRCA1 and BRCA2 genes, which are linked to a higher risk 
of breast and ovarian cancer. (Another alliance project, the BRCA 
Exchange, seeks to provide easily searchable interpretations of BRCA 
variants that have been shared by groups outside Myriad.) 
But in other cases, data access works best, for both sides, when 
the requests for information are targeted at specific traits. And as the 
technology to permit that improves, so will smart sharing. m 


@ 
At gunpoint 
The problem of gun violence in the United States 
must finally be addressed. 


lost around 6,000 lives to gun violence — dozens of them in mass 

shootings in public spaces. The attack that left 49 men and women 
dead in Orlando, Florida, this month is, by some counts, the 136th mass 
shooting in the United States just this year. 

Mourning — and then moving on — in the wake ofa mass shooting 
has become a sombre tradition. But after Orlando, a new development 
emerged. On 14 June, the American Medical Association (AMA) 
declared gun violence a public-health crisis, and announced that it 
will apply its considerable lobbying power to pressure Congress to fund 
research into this violence. It is cause for optimism that a lengthy freeze 
on federal funding for such research — particularly at the Centers for 
Disease Control and Prevention (CDC) — may soon thaw. 

It makes sense that this push would come from the medical com- 
munity: it has a front-row seat on the violence. “Here we are again,” 
physicians wrote in a New England Journal of Medicine editorial in Janu- 
ary, following a shooting in San Bernardino, California, that killed 14 
and injured 22. Six months later, at a press conference following the 
Orlando tragedy, one surgeon choked back tears as he described the 
chaos in an emergency room filled not only with the injured, but also 
with hundreds of their panicked friends and families. Another coolly 


iE has beena bloody year in the United States. So far, the country has 
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described the reality that surgeons at his Orlando trauma centre face 
daily: people wounded by high-calibre assault rifles, once considered 
to be the exclusive domain of the military, now flooding into civilian 
emergency rooms. 

Yet while doctors struggle to treat the wounded, the CDC has been 
hamstrung in tackling fundamental public-health questions about the 
causes of gun violence and its possible solutions. An amendment placed 
on appropriations bills since 1996 has prohibited federally funded 
research from advocating gun control — a provision that some have 
interpreted as making gun-violence research broadly off limits. 

In 2013, US President Barack Obama explicitly stated that such 
research should take place and need not be interpreted as advocacy, 
but Congress failed to allocate funds in the CDC budget to support it. 
(The US National Institutes of Health, which has more discretion in 
how it applies its funding, has sponsored some gun-violence research 
following Obama’s announcement.) 

The AMA is a lobbying powerhouse: in 2015, it was the fourth- 
largest lobbyist in the country. If it chooses to make gun-violence 
research a high priority, it has the resources to make headway. But it will 
take a tremendous push — and coordination with other stakeholder 
organizations — to do so. 

In the wake of the Orlando shooting, lawmakers followed what has 
become a legislative post-mass-shooting tradition: the rapid-fire pro- 
posal — and equally rapid rejection — of bills intended to address the 
country’s gun-violence crisis. Earlier this week, the US Senate defeated 
five such measures. Similar proposals, including one intended to 
explicitly allow research into gun violence, met the same fate last 
December. But with concerted effort from the AMA and others, 
perhaps the United States will break with these traditions. m 
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reported extinction of the Bramble Cay melomys (Melomys 

rubicola). The last of these Australian marsupials is thought 
to have disappeared around 2009, but the release last week of a report 
by the Queensland government stating the probable extinction of the 
species and the cause — sea-level rise induced by climate change — 
made worldwide news. 

The death of the last individual of the last population ofa mammal 
species, indeed of any species, is as irreversible as it is profoundly sad. 
Yet the widespread coverage of this extinction and the subsequent 
outpouring of concern from across society tapped into something 
else. Species go extinct every day with little fanfare or report. The last 
Australian mammal to go extinct before the melomys was the Christ- 
mas Island pipistrelle (Pipistrellus murrayi) in 
2009, with almost no press. The melomys extinc- 
tion was covered because it ended the idea that 
climate change will be a concern for species 
only in the future. That reflects a fundamental, 
widespread problem with how we think about 
and report on climate change, especially when 
it comes to nature and conservation. Too many 
people still think that climate change is a problem 
that we can deal with later. 

It’s easy to see why. Climatologists use 
long-term forecasts, on timescales such as 
50-100 years, and for good reason. It takes long 
periods of time for alterations in atmospheric 
concentrations of greenhouse gases to cause 
change. Looking ahead for a scientist brings 
increased certainty — we know that there will 
bea problem to address. And politicians like to 
emphasize the long term for the opposite reason: they can stress the 
uncertainties in the detail, and talk about action without needing 
to take any. Yet these distant forecasts have also become the basis of 
how people assess and communicate the probable effects of climate 
change on species and ecosystems. And as the Bramble Cay melomys 
shows, we are seeing those impacts now. 

The world’s climate system is already seriously disrupted: the global 
average temperature is already nearly 1 °C warmer than it should be. 
Across Earth, we are seeing radical shifts in daily temperatures, rain- 
fall regimes and the timing of seasons, as well as overall increases in 
the number and intensity of droughts, cyclones and floods. It is now 
accepted that we have moved beyond the natural climate cycle and 
that, even if climate-mitigation policies are implemented immediately, 
it will take centuries to recover. 

Nature is in the firing line. Climate change introduces new threats and 
speeds up existing declines. There is an avalanche of extinctions com- 
ing because of the direct impacts of change — temperature, rainfall and 
sea-level rise. But that is not the end of it. Climate change also interacts 


Coser change has claimed its first mammal casualty, with the 


ACCEPT AND 
COMMUNICATE THAT 
CLIMATE CHANGE IS 

ALREADY 


UPON US 
AND PROACTIVE 
ACTION IS 


NEEDED NOW. 


Bring climate change back 
from the future 


The ‘shock’ over an Australian extinction shows that we still don’t accept that 
global warming is a problem for now, says James Watson. 


with other major forces that have precipitated the current extinction 
crisis — most of which are also driven by human actions. Vulnerable 
human communities are responding to the changing climate, and add- 
ing significant pressure to already degraded ecosystems. For example, 
expansion of agricultural activities owing to more favourable rainfall 
regimes across the Albertine Rift and the valleys of the Congo Basin now 
increasingly threatens the most biodiverse regions in Africa. 

If we are going to have a fighting chance to avert the current 
extinction crisis, we must accept and communicate that climate 
change is already upon us and that proactive action is needed now. 
We should not treat the news of the extinction of the melomys as an 
interesting question for Trivial Pursuit or an undergraduate exam 
— we need to treat it as a lesson. 

This species did not live in a place where its 
existence came into conflict with other societal 
needs, such as good farming land or places to 
live. It was on an uninhabited island, effectively 
protected from other threats. A wide range of 
actions could have been taken to manage its 
population without causing conflict with other 
competing agendas. 

Australian marsupials are well researched, 
and given the melomys’s habitat requirements, 
the islands low elevation and the fact that there 
is widespread knowledge of increasing sea lev- 
els across coastal Australia, it was not hard to 
work out that the species was in dire trouble. Yet 
almost nothing was done in time: there were no 
proactive plans to monitor the melomys, move a 
few individuals to create a rescue population or 
create a simple sea-level barrier. No action was 
taken because of the attitude that climate change is not really happen- 
ing yet, and there is time to sort it out. 

This is unacceptable. We need a fundamental shift in how the 
scientific community, the media, policymakers and environmental 
funders view and discuss climate change. When we think about the 
impact of climate change on biodiversity, we need to start framing the 
issue as something that is already well under way and that, in conjunc- 
tion with other threats, needs to be managed now. Crucial to this will 
be research on what species are immediately threatened by climate 
change, followed by plans to help them to survive. It will be compli- 
cated, but to give nature a chance, we need to harness the fears of the 
future to address the realities of the present. = 


James Watson is an associate professor at the University of 
Queensland in Brisbane, president of the Society for Conservation 
Biology and director of the science and research initiative at the 
Wildlife Conservation Society. 

e-mail: jwatson@wes.org 
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Selections from the 
scientific literature 


RESEARCH HIGHLIGHTS 


Bone hormone 
boosts exercise 


A hormone released from 
bones enhances muscle 
function during exercise, 
giving old mice the capabilities 
of young ones. 

Gerard Karsenty of 
Columbia University Medical 
Center in New York City and 
his colleagues found that 
blood concentrations of a 
hormone called osteocalcin 
increased during aerobic 
exercise in mice, monkeys 
and people. The hormone 
helped muscles to adapt to 
exercise by increasing their 
uptake and use of glucose 
and other nutrients. Mice 
that lacked the gene for 
osteocalcin had diminished 
exercise capacity. 

Blood concentrations 
of osteocalcin declined 
as animals aged, and 
administering the hormone 
to 15-month-old mice gave 
them the exercise capacity of 
3-month-old animals. 

Cell Metab. 23, 1078-1092 
(2016) 


Early galaxy has 
wisps of oxygen 


Astronomers have detected 
oxygen in a 13-billion-year- 
old galaxy — the first time 
that the gas has been found 
at such an early stage of the 
Universe. 

A team led by Akio Inoue 
at Osaka Sangyo University 
in Daito, Japan, used the 
powerful Atacama Large 
Millimeter/submillimeter 
Array (ALMA) in Chile 
to measure the chemical 
make-up of the galaxy, which 
was discovered in 2012. 
Oxygen was only one-tenth 
as abundant as it is in the Sun, 
and the galaxy seemed to be 


Chameleons’ sticky spit grabs prey 


Adhesive mucus allows chameleons to snare 
insects with their long tongues. 

Pascal Damman at the University of Mons 
in Belgium and his colleagues collected mucus 
from the tongue pads of veiled chameleons 
(Chamaeleo calyptratus; pictured) and found that 
itis 400 times more viscous than human saliva. 
Using a model of chameleon tongue strikes, the 


low in neutral gas and dust. 
Such characteristics may 
have allowed ultraviolet 
light from the stars of this 
and other similar galaxies 
to escape and ionize the 
hydrogen atoms in the 
early Universe, eventually 
generating the levels of ions 
seen today. 
Science http://doi.org/bj5z (2016) 


Smart birds have 
big brains 


Birds that sing or use tools 
have about as many neurons 


in their brains as monkeys do. 


Pavel Némec at Charles 
University in Prague and his 
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team estimated that the mucus allows the animal 


to capture insects that are up to 60% of its body 


(2016) 


colleagues measured the brain 
size of birds from 28 species, 
and counted the number 
of cells in the organs. They 
found that intelligent birds 
such as parrots and songbirds 
have larger brains relative to 
their body size, with much 
higher neuron density, than 
do less-intelligent birds 
such as chickens. Moreover, 
a higher proportion of the 
neurons were located in the 
forebrain, which controls 
higher cognitive function. 
Such high neuronal 
densities could be 
contributing to the 
intelligence of the birds, the 
authors suggest. 
Proc. Natl Acad. Sci. USA 
http://doi.org/bjzx (2016) 
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size — larger than its natural prey. 

The size of prey a chameleon can nab is 
therefore not limited by the stickiness of its 
tongue, the authors say. 

Nature Phys. http://dx.doi.org/10.1038/nphys3795 


Microbe makes 
mice social 


Female mice that eat a high-fat 
diet produce litters with social 
deficits that are linked to 
changes in the offspring’s gut 
bacteria. 

Mauro Costa-Mattioli at 
Baylor College of Medicine 
in Houston, Texas, and 
his colleagues compared 
offspring from mothers 
that ate a high-fat diet with 
those from mothers ona 
normal diet. The high-fat- 
diet offspring spent less 
time interacting with other 
mice, and had reduced 
bacterial diversity in their 
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IEEE 2016 


guts. In the animals’ brains, 
researchers found fewer 
neurons containing oxytocin, 
a hormone that is related to 
social behaviour. Electrical 
signalling in the ventral 
tegmental area, a brain region 
that processes rewarding 
stimuli, did not strengthen 

as it normally does following 
new social interactions. 

The team identified a gut 
bacterium, Lactobacillus 
reuteri, that reversed these 
abnormalities when it was 
given to the high-fat-diet 
offspring. 

Cell 165, 1762-1775 (2016) 


MEDICAL DEVICES 


Insect-eye camera 
peers inside gut 


Mini cameras at the end ofa 
probe that are designed to ‘see’ 
like an insect’s compound eye 
could eventually be used in 
medical endoscopes. 

Omer Cogal and Yusuf 
Leblebici at the Swiss Federal 
Institute of Technology in 
Lausanne built a dome- 
shaped device measuring 
10 millimetres wide with 
24 tiny cameras covering its 
surface (pictured). In between 
the cameras are small, light- 
emitting fibre-optic cables. The 
cameras are placed in a way 
that provides a 180° x 180° or 
360° x 90°panoramic field of 
view. The researchers found 
that the resolution of the 
system was 1,000 times higher 
than other insect-eye-inspired 
devices of a similar size. 

When tested inside a tube 
mimicking a human colon, 
the device could reveal areas 
that are normally missed by 
conventional endoscopes. 
IEEE Trans. Biomed. Circuits Syst. 
http://doi.org/bj3q (2016) 


THERAPEUTICS 


Antibody double 
trouble for HIV 


Genetically engineered human 
antibodies that bind to two 
targets on HIV could one day 
be used to treat and prevent 
the disease. 

‘Broadly neutralizing’ 
antibodies can block various 
HIV strains, but the virus can 
overcome them by changing 
the viral protein that the 
antibodies recognize. To 
combat this viral escape, a 
team led by Jeffrey Ravetch 
at the Rockefeller University 
in New York City developed 
HIV antibodies that can 
recognize two different spots 
on the envelope protein that 
adorns the virus. One of these 
bispecific antibodies lowered 
viral levels in HIV-infected 
mice by more than tenfold 
in comparison with broadly 
neutralizing antibodies. 

An independent team 
led by David Ho, also at 
the Rockefeller University, 
created bispecific antibodies 
that recognize both the 
HIV envelope protein and 
human proteins that HIV 
uses to infect immune cells. 
The most potent of these 
antibodies protected mice 
from becoming infected with 
HIV and decreased viral levels 
in those already infected. 

Cell 165, 1609-1620; 
1621-1631 (2016) 


Plastic waste 
turned into fuel 


Plastic from bottles and bags 
can be degraded into liquid 
fuels and waxes using available 
catalysts. 

Polyethylene is the world’s 
most common plastic, 
but is difficult to break 
down, typically requiring 
temperatures higher than 
400°C. A team led by Zheng 
Huang at the Shanghai 
Institute of Organic Chemistry 
in China and Zhibin Guan at 
the University of California, 
Irvine, used low-cost and 
widely available reagents 
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called light alkanes, along 
with key catalysts, to convert 
polyethylene into oils and 
waxes. The team broke down 
various forms of the plastic 
into oil at 175°C within 4 days, 
at efficiencies ranging from 
51% to 86%. 

The team also used their 
process to convert 57-72% of 
everyday plastic waste such as 
bottles and bags into oils. 

Sci. Adv. 2,e1501591 (2016) 


ECOLOGY 


Aclimate refuge 
for trees 


Forests in northeastern North 
America (pictured) could 
thrive in a warmer climate. 

How trees will react to 
a warmer environment 
is unclear; low average 
temperatures hamper 
their growth but higher 
temperatures can limit water 
availability. Loic DOrangeville 
of the University of Quebec at 
Montreal in Canada and his 
colleagues used tree-ring data 
from more than 16,000 stands 
of black spruce (Picea mariana) 
across Quebec to track growth 
between 1960 and 2004. They 
found that north ofa latitude 
of 49°N, increased temperature 
had positive effects on tree 
growth, despite the lower 
availability of water. Below 
that latitude, however, only an 
increase in water availability 
boosted tree growth. 

Although boreal forests in 
central and western North 
America might be negatively 
affected by climate change, 
northeastern areas could act as 


a refuge for certain trees, the 
authors suggest. 
Science 352, 1452-1455 (2016) 


Mitochondria 
make nerves grow 


Enhancing the mobility of 
energy-producing structures 
called mitochondria in injured 
neurons helps these cells to 
regenerate in mice. 

After an injury, some young 
neurons can regrow their 
long signalling arms known 
as axons, but mature cells 
cannot. Zu-Hang Sheng of the 
National Institutes of Health in 
Bethesda, Maryland, and his 
colleagues knocked out a gene 
called Snph in cultured mouse 
neurons. The gene encodes 
a protein, syntaphilin, that 
anchors mitochondria inside 
cells. The team found that 69% 
of young neurons lacking the 
gene began to form growing 
tips, compared with only 44% 
that had the gene. Similar 
effects were seen in adult mice, 
and injured mature neurons 
showed an increasing ability 
to regenerate with declining 
levels of syntaphilin. 

Replenishing the energy 
supply in damaged axons 
by boosting the transport of 
mitochondria could be one 
way to treat nerve injuries, the 
authors suggest. 
J. Cell Biol. http://doi.org/bj3n 
(2016) 
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EVENTS 


Polar evacuation 
The US National Science 
Foundation has launched a 
rare and daring mid-winter 
effort to evacuate a crew 
member from its South Pole 
research station. The person’s 
name and medical condition 
have not been released, 

but 48 people are spending 
the winter season at the 
Amundsen-Scott South Pole 
Station. On 20 June, two Twin 
Otter aeroplanes arrived at the 
British Rothera station on the 
Antarctic Peninsula, which is 
being used as a staging base for 
the mission. The rescue team 
is awaiting suitable weather 

to make the 2,400-kilometre 
flight from Rothera to the 
South Pole station. 


Scientists say stay 
As the United Kingdom 
prepares to vote on whether 
or not to stay in the European 
Union, 5,000 researchers have 
signed a letter to The Times 
newspaper warning that a 
British exit, or ‘Brexit’ will 
damage science. The letter, 
organized by the anti-exit 
group Scientists for EU 

and published on 20 June, 

is the latest intervention by 
researchers ahead of the 

23 June referendum. Many 
scientists say that leaving 

the EU would be bad for 
research, but there is also a 
vocal contingent that holds the 
opposing view. 


Diluted vaccine 


Yellow-fever vaccine could be 
used effectively at a diluted dose 
should the ongoing epidemic 

in Africa worsen, the World 
Health Organization's expert 
committee on immunization 
agreed on 17 June. The 
epidemic — the worst in 

almost 30 years — has infected 
a reported 3,137 people and 


Space-station crew comes home 


An International Space Station crew — Tim 
Peake of the European Space Agency (ESA), 
Yuri Malenchenko of Roscosmos and NASAs 
Tim Kopra (pictured, left to right) — returned 
to Earth in a Soyuz capsule on 18 June, landing 
in the plains of Kazakhstan. Peake is the first 


killed 345 in Angola, the 
worst-affected country. The 
full-strength vaccine conveys 
lifelong protection, but mass 
immunization to control 

the epidemic has depleted 
stockpiles. Using the vaccine at 
one-fifth of its normal strength 
would provide protection for at 
least 12 months — adequate in 
an emergency situation — and 
would leave five times as much 
vaccine available. 


JAXA pay cuts 


Three top executives of the 
Japan Aerospace Exploration 
Agency (JAXA) are taking a 
temporary pay cut because of 
the loss of the agency’s Hitomi 
X-ray astronomy satellite. 
JAXA president Naoki 
Okumura and two others will 
reduce their pay by 10% for 
four months from July, the 
agency announced on 15 June. 
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After a successful launch in 
February, Hitomi was lost in 
March after engineers sent it 
the wrong software command, 
causing it to tumble out of 
control and break apart. 


Space fire 

NASA scientists ignited the 
largest-ever deliberate fire in 
space on 14 June. Engineers 
used a remote-controlled 

hot wire to set light to a sheet 
of cotton and fibreglass 
measuring 1 metre by 

0.4 metres. The material was 
housed in an uncrewed Orbital 
ATK Cygnus cargo ship that 
had just left the International 
Space Station. Instruments 
placed around the sheet 
monitored the fire as it burned 
for around eight minutes, and 
sent data back to Earth. The 
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British ESA astronaut, and Malenchenko has 
now spent 828 days in space, the second most of 
anyone. After arriving in orbit last December, 
the three took part in astronaut-health studies 
to investigate how our eyes, brains and immune 
systems adapt to long-duration space flight. 


Space Fire Experiment, or 
Saffire, is the first of three tests 
designed to understand how 
fire spreads in microgravity, 
and ultimately to improve fire 
safety for astronauts. 


Coffee and cancer 
Coffee is unlikely to cause 
cancer, said the World Health 
Organization's cancer agency 
on 15 June, reversing its 
previous guidance. In 1991, 
the International Agency for 
Research on Cancer in Lyons, 
France, had described coffee 
as “possibly carcinogenic to 
humans”. The latest guidance, 
which is based on a review 

of more than 1,000 studies, 
made no such connection. 
However, beverages consumed 
at a higher temperature (above 
65°C), such as piping hot 

tea and maté were deemed 
“probably carcinogenic” on 
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the basis of epidemiological 
evidence. The drinks hada 
particular link to oesophageal 
cancers (D. Loomis et al. 
Lancet Oncol. http://doi.org/ 
bj6x; 2016). 


South Pole CO, 


Atmospheric concentrations 

of carbon dioxide at the South 
Pole have passed 400 parts 

per million (p.p.m.) for the 

first time in 4 million years, 

the US National Oceanic and 
Atmospheric Administration 
reported on 15 June. The region 
crossed this symbolic threshold 
on 23 May, and was the last 
place on Earth to do so. Average 
global CO, concentrations, 
which registered around 

280 p.p.m. before the Industrial 
Revolution, hit 399 p.p.m. 

in 2015 and are expected to 
exceed 400 p.p.m. in 2016. 


Earth’s pet rock 


Astronomers in Hawaii have 
spotted an asteroid that has 
been tagging along with 

Earth for almost a century. 

The companion, designated 
2016 HO3, is estimated to be 
40-100 metres in diameter and 
poses no threat to Earth, NASA 
said on 15 June. It orbits both 
our planet and the Sun, drifting 
ahead of or behind Earth 

from year to year (pictured), 
but staying within a range of 
38-100 times the Earth-Moon 
distance. Other ‘quasi- 
satellites’ have been found, but 
2016 HO3 is the most stable 


TREND WATCH 


Belgian research institutes 
account for most of the jobs 


advertised under the European 


Union's Science4Refugees 
initiative, which collates posts 
open to refugee scientists and 
researchers. The initiative 
was launched last year to help 


highly qualified refugees to find 


jobs. Around 350 adverts from 


19 European countries were live 
on the EURAXESS researcher 
portal as of 21 June. Of these, 145 


were for jobs in Belgium. Refugees 


compete for the jobs on the same 
basis as non-refugee applicants. 


example so far. Quasi-satellites 
remain close enough to Earth 
to make them good targets for 
future spacecraft missions, says 
Robert Jedicke, an astronomer 
at the University of Hawaii in 
Honolulu. 


} FUNDING 
Gun-violence crisis 


California's state legislature 
voted on 15 June to establish 

a US$5-million firearms- 
violence research centre at 

the University of California, 
following the 12 June mass 
shooting in Orlando, Florida, 
that left 49 people dead and 

53 injured. There is little 
research on gun violence; since 
1996, it has been difficult — if 
not impossible — for the US 
Centers for Disease Control 
and Prevention (CDC) to study 
the topic, owing toa federal 
budget restriction that limits 


REFUGEES WANTED 


2016 HO3 orbit 
around Earth 


Earth 


research that could be used to 
promote gun control. Although 
US President Barack Obama 
ordered the CDC in 2013 to 
resume the research, the agency 
says that it does not have 

the resources. Separately, on 

14 June, the American Medical 
Association declared gun 
violence a public-health crisis, 
and resolved to lobby Congress 
to fund the CDC to investigate 
the causes of the problem. See 
page 436 for more. 


Excellence strategy 


Germany’s federal government 
and the governments of its 

16 states agreed on 16 June 

to permanently continue a 
multibillion-euro programme 
set up in 2005 to strengthen 
the performance of selected 
research universities. From 
2017, universities will be 

able to apply for an extra 

€10 million to €15 million 


Belgium offers the most science jobs open to refugees in Europe. 
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(US$11 million to $17 million) 
per year from a €533-million 
annual budget that the 
governments have earmarked 
for the extended initiative, 
renamed the Excellence 
Strategy. Over the next decade, 
the windfall will help to create 
up to 15 ‘excellence universities, 
as well as dozens of local and 
regional research hubs in 
selected fields of science. 


Biggest icebreaker 


Russia launched what will 

be the world’s largest and 
most powerful icebreaker 

on 16 June in St Petersburg, 
according to state news 
agencies. The Arktika is 
currently just a hull, and lacks 
a superstructure and the two 
nuclear reactors that will 
power it, but officials say that 
it will join the growing fleet 
of Russian icebreakers by the 
end of 2017. The vessel is not 
specifically designed to enable 
research, but it arrives as 
Russia seeks to maintain year- 
round access to the Arctic, 
and as many nations pursue 
economic opportunities 
presented by thinning sea ice 
in the region. 


ITER retooled 


The long-delayed 
international nuclear-fusion 
project ITER in St-Paul-lez- 
Durance, France, is to proceed 
with a revised construction 
road map that aims to see 
‘first plasma’ by the end of 
2025, its governing board said 
on 16 June. The streamlined 
schedule will focus on building 
a doughnut-shaped vacuum 
vessel that can generate and 
confine hydrogen plasma, 

and postpones the installation 
of components for sustained 
fusion of heavy hydrogen 
isotopes. Sticking to the 
original schedule would have 
required an extra €4.6 billion 
(US$5.2 billion) from ITER’s 
sponsors. It is not yet clear 
what savings the new plan will 
bring in the short term. 
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The country’s lucrative coal industry is one of many things putting climate change on the Australian election agenda. 


Australian election gives 
climate researchers hope 


Nation edges towards climate consensus as campaigns avoid carbon controversy. 


BY NICKY PHILLIPS 


ustralia’s elections often feature a fierce 
Aw over climate-change policy. In 
recent years, arguments over whether 

and how to put a price on carbon emissions 
have even swayed voters and toppled govern- 
ments. Australia is on the front lines of 
climate change: it is one of the world’s largest 
coal exporters and biggest carbon emitters per 
capita, and is already experiencing increasingly 
frequent extreme weather and coral bleaching. 
But political uproar over climate change 


has been more subdued in the run-up to the 
election on 2 July, which pits the current 
Liberal-National coalition government against 
the opposition Australian Labor Party. In part, 
that is because politicians are more focused 
on the country’s economy. But policy ana- 
lysts say the lack of debate suggests that the 
opposing parties are more closely aligned on 
climate action this time around. “We're not in 
as tumultuous a place as we were in previous 
years,’ says Frank Jotzo, director of the Centre 
for Climate Economics and Policy at the Aus- 
tralian National University in Canberra. Both 
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parties have set emissions targets, and “under 
both parties something will be done’, he says. 

A poll of 250,000 Australians, published 
on 2 June by the broadcaster ABC, suggested 
that 63% wanta price on carbon, up from 50% 
before the 2013 election. “It is an issue that 
keeps forcing itself into the conversation,’ says 
John Connor, chief executive of the Climate 
Institute, a policy think tank in Sydney. 

At face value, the policies of the two main 
parties seem distinct. The government, led by 
Prime Minister Malcolm Turnbull, is promising 
to continue a scheme that came into effect in 
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> 2014, which sees companies bid for funding 
for emissions-reduction programmes. From 
1 July, firms will be forced to buy carbon credits 
if they exceed a ceiling on carbon emissions. 

But falling demand for electricity means that 
is unlikely to happen to electricity generators, 
says Dylan McConnell, a research fellow at the 
Melbourne Energy Institute at the University of 
Melbourne, because the ceiling is set at a high 
point for emissions that was reached between 
2009 and 2014. He says that the government's 
policies are “definitely not adequate” to achieve 
its targets of cutting emissions to 26-28% below 
2005 levels by 2030. A spokesperson for envi- 
ronment minister Greg Hunt disagreed, saying 
that the government's policies put it on trackto 
meet its 2030 targets. 

The Labor opposition, which under leader 
Bill Shorten has a slight edge in opinion polls, 
is more ambitious: it has pledged to reduce 
emissions by 45% below 2005 levels by 2030, 
and to be carbon neutral by 2050. Labor would 
also introduce an emissions-trading scheme 
for electricity producers. “Labor’s policies are 
stronger with a clearer pathway to credibility 
than the coalitions, but much detail remains to 
be sorted,” says Connor. For example, it is not 
clear where the threshold on emissions inten- 
sity would be set. 

Climate analysts think that in practice, the 


two approaches could end up operating in a 
similar way — with the coalition’s pay-to- 
cut-emissions plan morphing into an emis- 
sions-trading scheme similar to the Labor 

proposal. “When 


“Australia has you look at the poli- 
oneofthelowest cies, unless you're a 
emissions- policy nerd, there’s 
reduction not really much dif- 
targets.” ference,’ says Tony 


Wood, head of the 
energy programme at the Grattan Institute, a 
think tank in Melbourne. Hunt’s spokesper- 
son rebuffed any comparison, saying that the 
government was not running an emissions- 
trading scheme. 

With warming oceans causing extensive 
damage to Australia’s iconic Great Barrier 
Reef, rival politicians are keen to be seen as 
promising action. On 30 May, Shorten pledged 
Aus$377 million (US$279 million) in new 
funding to improve the health of the reef, if 
he is elected. Turnbull then announced that 
his government would use up to Aus$1 billion 
from an existing clean-energy programme to 
support the reef’s health through projects to 
improve water quality, reduce emissions and 
provide clean energy. But marine biologist 
Terry Hughes, director of the ARC Centre of 
Excellence for Coral Reef Studies at James Cook 


University in Townsville — who has made head- 
lines with his reef-bleaching studies (see also 
Nature http://doi.org/bj45; 2016) — says that 
the money will make little difference because 
it wontt tackle the fact that global warming is 
the greatest threat to the reef. “Australia has one 
of the lowest emissions-reduction targets of any 
developed country and the highest per capita 
emissions. Those are the two areas the govern- 
ment should be addressing,” says Hughes. 

The government also refused to intervene 
when cuts to climate-change programmes at the 
national science agency, the Commonwealth 
Scientific and Industrial Research Organisation 
(CSIRO), were revealed earlier this year. The 
opposition has committed to an independent 
review of the agency and, on 12 June, promised 
CSIRO Aus$250 million extra as part of a pack- 
age to fund various science programmes, if the 
party is elected. 

Jotzo and Wood see the quieter consensus 
for action on carbon emissions as a relief after 
a decade of contentious climate politics. Those 
years saw a carbon ‘tax’ introduced by Labor's 
Julia Gillard in 2012, and then dismantled by 
a conservative coalition led by Tony Abbott in 
2014. “There is an opportunity for bipartisan- 
ship, which is part of the reason why the toxic- 
ity of the debate in this election hasn't been so 
strong,’ says Wood. = 
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ASTRONOMY 


Giant telescope rattles rural 
South African community 


Struggle over Square Kilometre Array highlights balancing act that scientists face. 


BY SARAH WILD, NORTHERN CAPE PROVINCE 


cc ove it away! We don’t want it!” 
a farmer shouted at a crowded 


meeting in Carnarvon, a small 
town in the semi-arid, sparsely populated 
Northern Cape, one of South Africa’s poorer 
provinces. He was talking about what will be 
the largest radio telescope in the world, the 


2 


TOP NEWS 


international Square Kilometre Array (SKA), 
a portion of which is due to be built nearby. 
Representatives from SKA South Africa, 
an organization of scientists, engineers and 
technocrats, were attending the meeting of 
farmers in May, in an attempt to respond to 
rising criticism of the project from local people. 
“Tt’s fine to be part of the international commu- 
nity, but how is it helping this community?” 
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came a faceless call from the other side of the 
meeting hall. 

In 2012, the SKAs coordinating organization 
decided that it would divide its thousands of 
dishes and many more antennas, whose com- 
bined ‘collecting area’ for radio waves will span 
approximately one square kilometre, between 
Australia and South Africa. The site in the 
Northern Cape will include 197 dishes, and 
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form part of the project’s first phase, SKA1. 
The 64-dish MeerKAT telescope, which will 
be part of SKAL, is already being built. The rest 
of the dishes will be added from 2018. 

Last year, opposition to the Thirty Meter 
Telescope on Mauna Kea, Hawaii, prompted 
the state’s supreme court to invalidate the tele- 
scope’ construction permit. Opposition to the 
SKA is unlikely to derail the project because 
legislation protects most of the Northern Cape 
for astronomy. But SKA South Africa officials 
say that they need community buy-in if the 
project is to be sustainable over its 50-year life. 

The struggle that is playing out in the 
Northern Cape illustrates the balancing act 
that scientists who lead gigantic projects must 
pull off — to highlight the benefits that the 
project will bring to an area without over- 
inflating expectations. 

When SKA South Africa proposed the 
SKA project to the Northern Cape commu- 
nity, starting with the MeerKAT telescope 


Northern Cape residents voice discontent with construction of the Square Kilometre Array. 


The MeerKAT telescope under construction in South Africa’s Northern Cape will form part of the world’s largest radio telescope. 


in 2008, it said that the project would lead to 
local economic development, create jobs and 
improve opportunities for children through 
education and science. But the organiza- 
tion never quantified these objectives — and 
now its director, Rob Adam, is struggling to 
manage the expectations of the poorest 
members of the Northern Cape, who are 
largely ‘coloured’ people, a recognized racial 
classification in South Africa. 

SKA South Africa has already come good 
on some of its promises. It now employs a 
high-school maths and science teacher for 
Carnarvon, for example, and is paying for five 
coloured students at Carnarvon high school 
to attend university as part of a pan-African 
bursary programme that it runs. But mem- 
bers of the coloured community complain 
that such resources haven't materialized across 
the board — not all the towns in the area have 
gained a high-school teacher, for example. 

And although a small influx of scientists, 
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engineers and contractors has to some extent 
improved the economies of the province's 
towns, the communities are not yet satis- 
fied. “What's in it for us?” asked one resident 
at a meeting in the Northern Cape town of 
Brandvlei in May. 

Adam says that the community’s expecta- 
tions have risen beyond what the SKA can 
provide. “You must understand, we are not 
the government, the education department 
and the police, all rolled into one,” he told the 
crowd in Brandvlei. 

The problem is different for members of the 
richer, mainly white, sheep-farming commu- 
nity of the Northern Cape, who are concerned 
about SKA South Africa’s land acquisition. 

According to the Astronomy Geographic 
Advantage Act, which was passed in 2007, 
the government has the right to acquire land 
for the project within a designated ‘core’ area 
if negotiations fail, and if the land is required 
for the SKA and the organization has offered 
a fair price. 

In 2008, the government bought Losberg 
farm, the site of the MeerKAT telescope. What 
is riling this community is that SKA South 
Africa is now eyeing 36 other farms — which 
comprise about 118,000 hectares — to accom- 
modate the further 133 dishes that make up 
SKAI. 

Many farmers say that the loss of their 
farms will destroy the local, agriculture-based 
economy, and that they are being forced to 
sell. Although the amount of land needed for 
the SKA is now agreed, the farmers are also 
suspicious about the scope of the project. 
“They don't believe things will stop here,” 
says Henning Myburgh, general manager of 
farmers’ organization Agri Northern Cape in 
Kimberley. 

The spectre of Zimbabwe-style land expro- 
priation, in which the government took land 
from white farmers without compensation, is 
also present. “It’s a land grab, one way or the > 
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> other, be it for SKA or something else,” says 
Eric Torr, a former resident of the province 
who owns a local aviation company. 
Expropriation would be a last resort, say 
SKA South Africa officials. “It’s not in the 
best interests of the SKA to do that because 
we have to live in this community,’ says Alice 
Pienaar-Marais, who is in charge of the land- 
acquisition process. She is confident that SKA 


South Africa will acquire the 36 farms by the 
end of next year, in time for SKA1 construc- 
tion in 2018. 

SKA Australia, meanwhile, “could be doing 
more’ with respect to community engagement, 
project director David Luchetti told Nature. 

The Australian SKA Pathfinder telescope, 
which is currently being commissioned, is to 
be built on an area that traditionally belongs 


to the Wajarri Yamatji tribe. Following the 
2009 Indigenous Land Use Agreement, which 
was negotiated between the government and 
the indigenous group, the tribe has received 
benefits worth more than Aus$18.1 million 
(US$13.5 million) in exchange for the use of 
the land for radioastronomy. 

But the agreement needs to be renegotiated 
for the SKA before construction starts. = 


An earthquake-triggered tsunami is thought to have drowned this Pacific Northwest forest 2,000 years ago. 


SEISMOLOGY 


Canada builds quake 
warning system 


Undersea instruments will monitor the Cascadia fault zone. 


BY NICOLA JONES 


n 15 June, Canada broke ground on 
() its first earthquake early-warning 
system. Sea-floor sensors will moni- 
tor the Cascadia subduction zone off British 
Columbia to provide crucial seconds of warn- 
ing if the ‘big one hits. Putting sensors so close 
to the fault should give the Canadian system 
an edge over a more developed sister project 
in the United States. 
To produce early warnings of quakes, 
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scientists rely on a network of seismometers 
and accelerometers to detect the tremor’s 
first, non-destructive primary (P) waves. 
Those waves travel faster than the destructive 
secondary (S) waves, and so hit cities seconds 
to minutes earlier. The closer that detectors are 
to the source of an earthquake, the more warn- 
ing they can provide. That time can be used 
to stop high-speed trains, shut down nuclear 
reactors and tell the general population to 
brace for shaking. But with offshore faults, get- 
ting close to the action means putting sensors 
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under water, which is very expensive. 

Japan pioneered earthquake early warnings. 
The country has had a system to stop bullet 
trains since the 1960s, and public warnings have 
been issued since 2007. During the magnitude-9 
Tohoku earthquake of March 2011, residents of 
Sendai, the major city nearest to the epicentre, 
got 15 seconds of warning; Tokyo got more than 
a minute. Japan added data from an array of 
undersea seismometers to its earthquake early- 
warning system in August 2011, anda second 
phase of that project was completed in March, 
more than doubling the number of offshore 
detectors to 50. Now, the country is working on 
an ambitious 150-station network called S-net. 
Connected by 5,700kilometres of cable, it could 
provide up to an extra 30 seconds of warning for 
a large offshore quake. 

The United States and Canada have lagged far 
behind Japan, despite the fact that the Cascadia 
subduction zone off North America’s west coast 
is expected to one day produce a catastrophic 
‘megathrust’ quake similar to the Tohoku one. 

By the end of June, the research non-profit 
group Ocean Networks Canada (ONC) in Vic- 
toria plans to have installed three accelerom- 
eters on its NEPTUNE sea-floor observatory, 
which consists of more than 840 kilometres of 
ocean-bottom cable looped out past the Cas- 
cadia fault (see ‘Quake watch’). “I took this job 
and asked, “Why aren't we doing earthquake 
early warning?” says ONC president Kate 
Moran, who joined the organization in 2011 
as director of NEPTUNE. 

The network already has a handful of 
seismometers, but these send data back 
in packets instead of instantaneously, and 
the information is subject to censorship by 
the navy. As such, Moran says, they are ill- 
suited for an early-warning system. The new 
accelerometers, which have a simpler data 
stream designed to circumvent these issues, 
were made possible by a Can$5-million 
(US$3.9-million) grant from the British 
Columbia government in February. 

The team is also hoping to install a tiltmeter 
down a 300-metre borehole, to detect slow, 
almost imperceptible movement of the tectonic 
plates at the fault. Clusters of such slow-slip 
events occurred before the 2011 Japan quake, 
and detecting them might help seismologists 
to track the strain that is building on the fault. 

Moran anticipates that within 5 years, the 
ONC will have 40 accelerometers on- and 


JOHN STANMEYER/NGC 


SOURCE: ONC 


QUAKE WATCH ! 


The Cascadia subduction zone — which separates 
the North American tectonic plate from the 
oceanic plates — is due for a large, destructive 
earthquake. Canada’s early-warning system 
(shown in red) will spot seismic activity before 

the shaking hits land. 
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offshore to produce public early warnings. The 
instruments will be positioned specifically to 
detect an earthquake resulting from a subduc- 
tion-zone tremor. Spotting quakes from other 
faults, which would be smaller but potentially 
much closer to cities, would require a signifi- 
cantly denser network of accelerometers. 

On the US west coast, a network of onshore 
accelerometers can already alert a select group 
of users — such as the Bay Area Rapid Transit 
system in northern California — to the early 
rumbles of earthquakes. That prototype pro- 
gramme, called ShakeAlert, is hoping to get its 
information to a wider audience soon. “I think 
were really now, finally, at the beginning of 
rolling out a public system, after years of trying 
to get funding,” says ShakeAlert lead Richard 
Allen, a seismologist at the University of Cali- 
fornia, Berkeley, who anticipates issuing public 
alerts within five years. ShakeAlert got its first 
congressional funding in December 2014, and 
now has about half the funds it needs for a full 
system, says Allen. To reliably detect quakes 
from multiple fault lines, Allen reckons that 
the network needs about 1,100 detectors just 
in California, where it currently has about 500. 

The US National Science Foundation 
supports a handful of wired sea-floor seis- 
mometers off the coast of Oregon as part of 
its Ocean Observatories Initiative. But these 
sensors have the same problems as the cur- 
rent Canadian ones, says Martin Heesemann, 
a marine geoscientist with the ONC. He adds 
that the group’s new accelerometers will be the 
only instruments on North America’s megath- 
rust fault designed specifically for early warn- 
ing rather than research. 

The offshore Canadian system “will totally 
be better” than the US system, says Moran with 
a laugh. “It’s nice to be better than the United 
States.” m 
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Sexes deal differently 
with infection 


Quirks of immune system pose medical conundrum. 


BY SARA REARDON 


he immune systems of men and women 

respond very differently to infec- 

tion — and scientists are taking notice. 
Research presented last week at a microbiol- 
ogy meeting in Boston, Massachusetts, sug- 
gests that the split could influence the design 
of vaccination programmes and lead to more 
targeted treatment of illness. 

Hints that men and women deal with infec- 
tion differently have been around for some 
time. In 1992, the World Health Organization 
hastily withdrew a new measles vaccine after it 
was linked to a substantial increase in deaths 
of infant girls in clinical trials in Senegal and 
Haiti. It is still not clear why boys were unaf- 
fected, but the incident was one of the first 
such examples to catch scientists’ attention. 

Women might have evolved a particularly 
fast and strong immune response to protect 
developing fetuses and newborn babies, 
says Marcus Altfeld, an immunologist at the 
Heinrich Pette Institute in Hamburg, Ger- 
many. But it comes at a cost: the immune 
system can overreact and attack the body. 
This might explain why more women than 
men tend to develop autoimmune diseases 
such as multiple sclerosis and lupus. 

Yet very few studies assess men and women 
separately, so any sex-specific effects are 
masked. And many clinical trials include only 
men, because menstrual cycles and pregnan- 
cies can complicate the results. “It’s sort of an 
inconvenient truth,’ says Linde Meyaard, an 
immunologist at University Medical Center 
Utrecht in the Netherlands. “People really 
don’t want to know that what they study in 
one sex is different from the other” 

Now, scientists are beginning to tease out 
some precise mechanisms. At the meet- 
ing, infectious-disease researcher Katie 
Flanagan at the University of Tasmania in 
Australia reported on a tuberculosis vac- 
cine given to Gambian infants. She found 
that the vaccine suppressed production of 
an anti-inflammatory protein in girls, but 
not boys. This boosted the girls’ immune 
responses, and may have made the vaccine 
more effective. 

Hormones also play a part. Oestrogen 
can activate the cells involved in antiviral 
responses, and testosterone suppresses 
inflammation. 

Treating nasal cells with oestrogen-like 
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compounds before exposing them to the 
influenza virus has revealed further clues, 
says Sabra Klein, an endocrinologist at 
Johns Hopkins University in Baltimore, 
Maryland. Only the cells from females 
responded to the hormones and fought 
off the virus (J. Peretz et al. Am. J. Physiol. 
http://doi.org/bj5w; 2016). 

Genetic factors may also guide how the 
sexes deal with infection. Meyaard studies a 
protein called TLR7, which detects viruses 
and activates immune cells. Encoded by 
a gene on the X chromosome, the protein 
causes a stronger immune response in 
women than in men (G. Karnam et al. PLoS 
Pathogens http://doi.org/bj5x; 2012). Mey- 
aard suspects that this is because it somehow 
circumvents the process whereby one of the 
two X chromosomes in women is shut down 
to avoid overexpression of proteins. 

A study set to begin later this year could 
help to tease apart the relative influence of 
genes and hormones on infection. Altfeld 
and his colleagues will look at 40 adults 
going through sex- 
change operations. 


“It’s a sort of 
inconvenient pitas ae nate 
truth.” Pp : 


transgender women 
in the study should 
begin mounting stronger immune reactions 
to infections and develop more autoimmune 
problems than the transgender men. 

Whether such results will lead to changes 
in how drugs are administered is an open 
question. In 2014, the US National Institutes 
of Health (NIH) announced that researchers 
must report the sex of animals used in pre- 
clinical research. Similar efforts are under 
way in Europe. But a 2015 report from the 
US Government Accountability Office 
(GAO) found that the NIH does a poor 
job of enforcing rules requiring that clini- 
cal trials include both sexes (see go.nature. 
com/281l4nb). 

According to the GAO, even if studies 
include both sexes, the NIH also does not rou- 
tinely track whether researchers have actu- 
ally evaluated any differences between them. 
Klein argues that gathering such data could 
lead to more-effective programmes — halving 
vaccine doses for women, for instance. 

“People are tending to ignore it for as long 
as possible,” Flanagan says. “People will get a 
lot of surprises.’ m 
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Iconic Antarctic 
lab gets the boot 


Homeless sediment collection 
dates back to 1960s. 


BY ALEXANDRA WITZE 


metres of skinny tubes of dirt. With 
them comes half a century of Antarctic 
geological history. 

The US National Science Foundation 
(NSF) is looking for a new place to store its 
Antarctic marine-sediment cores, the world’s 
biggest collection of environmental records 
from the Southern Ocean. The cores have lain 
on shelves at Florida State University in Talla- 
hassee since 1963. But last year, the university 
told the NSF that it no longer wanted to host 
the collection. Ideas for where the Antarc- 
tic Marine Geology Research Facility might 
move to are due by 3 August. 

“This area of research is not a priority for 
the current faculty,’ says Gary Ostrander, 
vice-president for research at Florida State. 
“Tt doesn't make sense to continue to support 
that size facility” 

The NSF contributes roughly US$280,000 
per year, but the university has to pay for 
overhead costs such as air conditioning for 
the 930-square-metre building. 

The invaluable collection includes cores 
gathered in the 2000s by the international 
ANDRILL programme, which revealed the 
history of the West Antarctic Ice Sheet over 
the past 17 million years. 

The transfer is a blow for Sherwood Wise, a 
geologist at Florida State and the facility's prin- 
cipal investigator. “It ll be a sad day for me,” 
he says. “This has been a marvellous resource 
for the university.” Dozens of researchers from 
around the world visit the collection every year 
to study palaeoclimate clues and other evi- 
dence buried within the sediments. 

Over the years, more and more cores have 
accumulated, from more than 90 research 
cruises. Studies of the samples have triggered 
hundreds of publications on all aspects of 
Southern Ocean and Antarctic history. 

Curating these older materials is vital 
because Antarctic samples are so expensive 
and difficult to gather, says Philip Bart, a 
marine geologist at Louisiana State Univer- 
sity in Baton Rouge. “The facility is critical to 
ongoing research,’ he says. 

Wherever and whenever the Florida cores 
move, Wise estimates that it will take around 
$2 million just to pack them up and ship 
them. = 


F ree to a good home: more than 23 kilo- 
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A computer simulation of the black-hole merger detected on 26 December 2015. 


LIGO sees second 
black-hole crash 


First gravitational-wave detection was not a fluke. 


BY DAVIDE CASTELVECCHI 


B. S. Sathyaprakash woke up to some 

good news: gravitational waves had been 
detected for the second time in history. 

The theoretical physicist from Cardiff 
University, UK, had his laptop next to his 
bed, set up to alert him when he received 
automated e-mail notices from computers at 
the Advanced Laser Interferometer Gravita- 
tional- Wave Observatory (LIGO). 

“T got up and I went and checked the com- 
puter. Lo and behold, there was an event 
from just two minutes before,” he says. At 
3:38:53 UTC (Coordinated Universal Time), 
LIGO’s twin detectors in Louisiana and 
Washington state had both picked up the 
signature ripples of two massive objects — 
probably two black holes — in the final stages 
of spiralling into each other. 

At the time, the international LIGO collabo- 
ration and its colleagues at Virgo, a European 
observatory near Pisa, Italy, were busy analys- 
ing LIGO’s first discovery: the event that they 
had detected on 14 September. The scientists 
would announce that finding in February, to 
great global fanfare. They did not run a full 
analysis of the second event until weeks later, 
says LIGO physicist Bruce Allen, managing 
director of the Max Planck Institute for Gravi- 
tational Physics in Hanover, Germany. 


ye before 4 a.m. on 26 December, 
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“It was absolutely mind-boggling how, 
within a few months of the first event, we had 
a second one,’ Sathyaprakash says. 


SECOND SUCCESS 

The second detection “shows that this whole 
business is not a fluke”, says Clifford Will, 
a theoretical physicist at the University of 
Florida in Gainesville, who is not a member 
of either the LIGO or Virgo teams. In princi- 
ple, the September discovery could have been 
a huge stroke of luck, but the second event 
suggests that there is a large population of 
black-hole pairs out there that will produce 
frequent mergers. LIGO and Virgo can look 
forward to regular detections, says Will, who 
studies gravitational waves and other predic- 
tions of Albert Einstein’s general theory of 
relativity. “This is going to be a new kind of 
astronomy.” 

Einstein predicted that any accelerating or 
rotating bodies should produce ripples in the 
fabric of space, which are vaguely similar to 
sound waves but move at the speed of light 
and can propagate in a vacuum. 

Detailed analysis of the second detection 
confirmed that the signature had to be the 
ripples from a pair of black holes (see video at 
go.nature.com/28lwdkf). This time, the signal 
from the gravitational waves lasted for one full 
second, instead of one-fifth of a second as in 
the first event. The second event encompassed 


NUMERICAL-RELATIVISTIC SIMULATION: S. OSSOKINE/A. BUONANNO (MAX PLANCK INST. GRAV. PHYS.)/SIMULATING 
EXTREME SPACETIME PROJECT; SCIENTIFIC VISUALIZATION: T. DIETRICH/R. HAAS (MAX PLANCK INST. GRAV. PHYS.) 
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the objects’ last 27 orbits around each other, 
compared with just 5 or so from the first detec- 
tion. This enabled the researchers to get a test of 
general relativity that was in some respects twice 
as precise as their test during the first detection. 
This was true even though the September 
event was ‘louder’ than the December event. 
The first two black holes weighed as much as 36 
and 29 times the mass of the Sun, respectively, 
whereas the holes in the second pair were rela- 
tively lightweight, at 14 and 8 solar masses, and 
radiated one-third as much gravitational energy. 
(In both cases, the black-hole pairs might have 
been orbiting each other for millions or billions 
of years, but LIGO captured only the finale, 
when the orbits and the gravitational waves 
that they produced had frequencies within the 
observatory’s window of sensitivity.) 


WELL-TIMED GIFT 
The LIGO and Virgo scientists estimate that 
both collisions occurred more than 400 mega- 
parsecs (1.3 billion light years) from Earth, 
although the distances could not be measured 
precisely. The teams presented their latest 
results on 15 June at a meeting of the American 
Astronomical Society in San Diego, California, 
and published them in Physical Review Letters 
(B. P. Abbott et al. Phys. Rev. Lett. 116, 241103; 
2016). 

The latest discovery was especially 


exciting for physicist Chad Hanna, a LIGO 
collaborator at Pennsylvania State University 
in University Park. When Hanna got the alert 
by text message, it was still Christmas Day in 
the United States, and he was with family. The 
collaboration requires members to keep data 
completely confidential, so Hanna hopped out 
of his chair, got his laptop and went upstairs 
to an empty room. At first, he was sceptical, 
he says: “I just didn’t think the Universe had 
such a sense of humour to send a real event 
on Christmas.” 

But he soon realized that although the 
signal was relatively quiet, it was genuine. It 
was a crucial test for the software that combs 
the data from the two observatories in real 
time, which he helped to design. The system 
could catch events even if they were half-bur- 
ied in noise, but didn’t produce too many false 
positives. 


RESET 
Ultimately, the automated alerts from LIGO 
will notify dozens of teams of astronomers 
as well. The researchers will then reposition 
their telescopes in the hope of detecting vis- 
ible light, or other electromagnetic waves, that 
originate from the same events that produce 
the gravitational waves. 

After its first four-month science run 
between September 2015 and January 2016, 
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the US$620-million Advanced LIGO has shut 
down for an upgrade. It is scheduled to start 
another run this September, accompanied by 
an upgraded Virgo. 

The results released last week complete 
the search for black-hole mergers in LIGO’s 
autumn run, but the collaboration is still sift- 
ing through its data for other types of event 
— and may yet announce further discoveries 
even before the next run begins. In particular, 
the international Einstein@Home project is 
looking for signals with the help of computers 
belonging to volunteers around the world. m 


CORRECTIONS 

It costs the United Kingdom £161 million 
per week to be in the EU, not £250 million, 
as stated in the Editorial ‘Turning point’ 
(Nature 534, 295; 2016). The News 

story ‘Gene therapies pose million-dollar 
conundrum’ (Nature 534, 305-306; 2016) 
should have said that cancer drugs that 
unleash the power of the immune system 
cost up to $40,000 per month, not per year. 
And the News Feature ‘The secret history 
of ancient toilets’ (Nature 533, 456-458; 
2016) incorrectly said that roundworms 
and whipworms cause dysentery — they 
cause problems such as malnutrition. 
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Science in China 


A specialissue looks at the country’s astonishing scientific trajectory 
as it seeks to secure its spot among the leaders in innovation. 


hinese science has been moving at breakneck speed 

for the past few decades, fuelled by vast infusions 

of cash and a rapidly growing technical workforce. 
China now boasts more researchers than the United States, 
outspends the European Union in research and develop- 
ment and is on track to best all other nations in its yearly 
production of scientific papers. But there have been bumps 
along the way. Chinese research has generally had low 
impact, and there have been persistent concerns about 
quality, which the country is trying to address. 

This special issue looks at the state of science in China. 
As part ofa new 5-year plan, Chinese leaders have pledged 
to boost research funding to 2.5% of the country’s gross 
domestic product by 2020. An infographic (page 452) 
charts the rapid rise of Chinese 


promise of research in fields ranging from neuroscience to 
neutrinos. One area in which the country is vying to lead 
the world is DNA sequencing — and an article on page 462 
shows that it wants to dominate precision medicine, too. 
An Editorial (page 435) notes that even with its impres- 
sive scientific gains, China still has far to go before it 
becomes a leader in innovation. Wei Yang, head of the 
National Natural Science Foundation of China, which is 
the leading funder of scientists in the country, says in a 
Comment (page 467) that China needs to improve the 
quality, integrity and applicability of its basic research. 
And policy specialist Douglas Sipp and stem-cell biologist 
Duanging Pei argue on page 465 that, contrary to com- 
mon perceptions, China offers lessons for other nations 
on how to govern ethically sensi- 


science and examines some of 
its problems. On page 456, pro- 
files of ten of the nation’s leading 
scientists show the breadth and 


y SCIENCE IN CHINA 


} A Nature collection 
nature.com/chinafocus 


tive research in the life sciences. In 
this and other areas of science, the 
rest of the world will be watching 
closely as China races forward. m 
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CHINA 


by the numbers 


BIG SCIENCE 


The scale of some of China’s experimental facilities — from one of the world’s largest telescopes to the 
deepest underground laboratory — showcases the country’s soaring research ambitions. 


QUANTUM-COMMUNICATION HIGHWAY 
The world’s longest quantum-communications network, 
connecting Beijing to Shanghai, opens in July. 


JIUQUAN SATELLITE 
LAUNCH CENTRE 
China launched its first 
scientific mission from 
here in 2015, and plans 
two more this year. 


Beijing 
BY RICHARD VAN NOORDEN 


SHANGHAI SYNCHROTRON 
RADIATION FACILITY 
Was China’s biggest 


Research capacity has 
grown rapidly, and now MAINLAND CHINA Se gs soencs Belly 


quality is on the rise. f 


6 


500-METER APERTURE SPHERICAL ae 
TELESCOPE (FAST) 
The world’s largest single-aperture radio 


telescope should be completed this year. 


& 


Shanghai 


DAYA BAY NEUTRINO DETECTOR 


Measures rate of neutrino oscillations. 
PANDAX =< 
DARK-MATTER 


DETECTOR 
World’s deepest KX) YOY 


underground 
laboratory. 


JIANGMEN UNDERGROUND NEUTRINO 
OBSERVATORY (JUNO) 
Under construction, for 2020. 


hina’s blazing economic growth has 

cooled in recent years, but the nation’s 
scientific ambitions show no signs of fad- 
ing. In 2000, China spent about as much 
on research and development (R&D) as 
France; now it invests more in this area 
than the European Union does, when 
adjusted for the purchasing power of its 
currency. That surge in funding has paid 
off. China now produces more research 
articles than any other nation, apart from 
the United States, and its authors fea- 
ture on around one-fifth of the world’s 
most-cited papers. Top Chinese scientific 
institutions are breaking into lists of the 
world’s best, and the nation has created 
some unparalleled facilities. 

There’s room for improvement within 
that bright picture. China steers much less 
of its R&D funding towards basic research 
than do many science powerhouses, and 
its international collaboration rates are 
on the low side. The scholarly impact 
of its papers is improving rapidly, yet it 
remains below the world’s average. And 
although China boasts more than 1.5 mil- 
lion researchers, that’s a small number 


TOP INSTITUTIONS 


The Chinese Academy of Sciences and the nation’s leading universities produce tens of thousands 
of papers every year. A sizeable portion of those rank among the world’s best. 


250 papers published in 2015 im Papers in the world’s top 10% 
CHINESE ACADEMY OF SCIENCES 


Network of institutes headquartered in Beijing 
36,996 papers / 19% in the top 10% 
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given its vast population. The country’s 9,429 / 14% 9,017 / 18% 9,030 / 20% 
leaders recognize some of the weaknesses oo an 
and have pledged to increase funding for | = 
science and technology, aiming particu- a r 
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FUDAN UNIVERSITY 
Shanghai 
6,072 / 18% 


Rounded to nearest 250 papers. 


NANJING UNIVERSITY 
Nanjing 
5,207 / 19% 
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SPENDING 


China’s science spending is soaring as the country’s economy grows and it 
devotes a greater share to R&D. In absolute terms, China’s R&D spending is still 
only about two-thirds of Europe’s. But when its lower wages are taken into 
account, this translates into a purchasing power that surpasses that of the EU and 
is on track to overtake the United States by the end of this decade. 


United States EU 28 
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Research spending adjusted for 
purchasing power (current US$ billions) 
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The country now invests more than 2% of its gross domestic product (GDP) on 
R&D — a greater proportion than the European Union — and wants to reach 2.5% 
by the end of this decade. However, only 5% of China’s R&D spending goes to basic 
research — a much lower proportion than that of other leading nations. Most of 
China’s R&D funding is aimed at commercially-related technology development. 
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China now has more scientists than the United States. But with a population of more than 1.3 billion, it trails 
behind other major science nations in terms of the density of its scientific workforce. 


EU 28 

China : 
United States : 
Japan 
Germany 


South Korea 


2.0 15 1.0 0.5 0) 


Number of researchers 
(millions) 


OUTPUT 


In the past decade, China’s share of the world’s research 
articles has surged from 13% to 20% — and its share of 
the world’s top-cited articles has shown similar growth. 
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Figures from China’s major basic-science funding 
agency, the National Natural Science Foundation 
(NSFC), suggest that women now receive around 
one-quarter of the grants — a figure comparable with 
that reported by research agencies in the United 
States and Europe. 


NSFC (China) 
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Grants awarded in 2015 


The scholarly impact of the country’s output overall remains below the world’s average — but it is rapidly 
improving. The country has its highest impact in the chemical sciences. 
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From ancient DNA to 
neutrinos and neuroscience, 
top researchers in China are 

making big impacts — and 
raising their country’s 
Standing in global science. 
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WU dl 


The country’s top space-science official aims high with 
bold research missions. 


BY CELESTE BIEVER 


Yutu rover that made up China’ first mission to explore the Moon's surface. That expedi- 

tion in December 2013 captivated the world and signalled China's vast ambitions in space. 
But for Wu, who has been director-general of China’s National Space Science Center (NSSC) in 
Beijing since 2003, a much bigger turning point came almost three years earlier. 

On 11 January 2011, he learned that his centre, which is a division of the Chinese Academy of 
Sciences (CAS), had won funding for a flotilla of spacecraft dedicated to scientific discovery. Up 
to that point, say Wu and others, almost all of China's space missions had been geared primarily 
towards advancing national prestige or demonstrating technological prowess. 

The 2011 announcement marked the culmination of more than a decade of research, per- 
suasion and international collaboration, mainly on the part of Wu — and the start of a new 
era in Chinese science. “China has changed direction, and he has been the most important 
player,’ says Roger-Maurice Bonnet, former director of science at the European Space Agency, 
who is an adviser to the NSSC and a scientist at the non-profit International Space Science 
Institute in Bern. 

Two of the NSSC missions have launched. One of them is Wukong, a space telescope hunting 
for signs of dark matter, which is thought to make up 85% of the matter in the Universe. “The 
data is coming down every day,’ says Wu. The mission’s team may have an announcement by the 
end of the year that could “be a mark in science history’, he says. 

Next up in 2016 will be the world’s first space-based experiment to probe the phenomenon of 
quantum entanglement, and the Hard X-ray Modulation Telescope (HXMT), which will survey 
abroad region of the sky with greater sensitivity at high energies than other wide-field telescopes. 

The funding for these missions has totalled about 3 billion yuan (US$455 million) since 2011, 
and Wu succeeded in winning the cash by persuading the top brass at the CAS and China's 
central government that his agency’s proposals for basic space-science missions would deliver 
breakthroughs. That message resonates with the government's push to invest more in funda- 
mental research. 

In person, Wu is hyper-focused on making clear that Chinese research must earn acclaim for 
its intrinsic value, not just because it is a first for the nation. “There is no Chinese space science,” 
he says. “Only science.” 

Funding for space research remains a concern because it is allocated in five-year cycles, making 
it difficult for research communities to mature. But he is confident that space science will gain 
a steadier source of support — especially if the latest satellites deliver the goods — because both 
Chinese politicians and the general public increasingly recognize the importance of scientific dis- 
covery. “We are a big nation,’ he says. “For human civilization, we should make contributions.” m 


A fleet of model spacecraft decorates Wu Ji’s office, including the Chang’e 3 lander and its 
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NANCY IP 


MAKING CONNECTIONS. Nancy Ip 
has been building bridges for much of her 
career. Born in Hong Kong, she found her 
calling in science during a graduate degree 
studying neurotransmitters at Harvard Medi- 
cal School in Boston, Massachusetts. Then 
she crossed into the biotechnology industry, 
where she explored the neurotrophic factors 
that support neuron survival and growth. She 
took all that expertise back to her native land 
in 1993, joining the Hong Kong University 
of Science and Technology (HKUST) when it 
was just two years old. 

“It was considered bold” to move to a place 
not known for its research, she says, but she 
wanted to contribute to the region. And since 
then, she’s worked to bolster science and bio- 
technology there through her research and 
leadership. “I sleep very little.’ says |p, who puts 
ina 12-hour-plus work day and gives credit to 
her support team. “Time flies when you are 
doing things that you enjoy.’ A lot of that time 
is spent with her large research group, which 
spans basic neural biology and translational 
science for neurological disorders 

lp has witnessed huge transformations 
since her return: Hong Kong transferred 
from British to Chinese control in 1997, and 
she’s seen mainland China’s science scene 
boom. And Ip is now building bridges to the 
mainland, where she hopes to further clinical 
research by accessing large populations of 
people with conditions such as Alzheimer’s 
disease; training people with expertise in 
both clinical medicine and research; and 
playing a leading part in a major brain pro- 
ject being developed in China. “I teach my 
students, sometimes you don’t know where 
research will take you.” m BY HELEN PEARSON 


CUI WEICHENG 


DEEP DIVER. Cui Weicheng will never 
forget the dive of his life: riding inside China’s 
Jiaolong submersible as it reached a depth 
of more than 7,000 metres in the Pacific’s 
Mariana Trench 4 years ago. “It’s rather des- 
olate down there — but strangely beautiful,” 
says Cui, who led the submersible project. 

Thanks to Jiaolong, China is now one 
of only a handful of nations that have the 
capability to explore the deep sea. Jiaolong, 
which is named after a mythical sea dragon, 
can travel deeper than any other manned 
research submersible currently in operation 
— allowing the country to reach more than 
99.8% of the ocean floor. 

“This symbolizes China’s increasing 
ambition — and leadership — in deep-sea 
research,” says Jian Lin, a marine geophysi- 
cist at the Woods Hole Oceanographic Institu- 
tion in Massachusetts. Until recently, China’s 
ocean research focused largely on coastal 
and offshore waters. But, driven by a growing 
desire for resources and a stronger position 
in international disputes over marine regions, 
itis stepping up its support for scientific pro- 
grammes in the deep ocean. 

Now at Shanghai Ocean University, Cui is 
aiming to reach the deepest place on Earth 
— the Challenger Deep valley at the bottom 
of the Mariana Trench, 11,000 metres down. 
To achieve this goal, he is leading an effort to 
build a more-pressure-resistant three-person 
submersible called Rainbow Fish at a cost of 
US$61 million. 

When it is completed in 2020, the vessel 
will be available for use by scientists around 
the world, says Cui. “The oceans belong to 
humanity rather than individual nations.” = 
BY JANE QIU 


NIENG YAN 
CRYSTAL 
CONNOISSEUR 


A structural biologist unlocks 
some problem proteins. 


BY ERIKA CHECK HAYDEN 


that can transform into other animals. Yan wondered what it would be like to change 

herself: “If you could shrink yourself into the size of a molecule or a protein, that would 
bea totally different world,” she recalls thinking. Now, as a leading structural biologist, Yan 
inhabits that world every day, investigating the way proteins work at the level of atoms. “It was 
almost destined that I would became a structural biologist,” she says. 

Yan did graduate and postdoctoral research at Princeton University, New Jersey, then set 
up her own laboratory at Tsinghua University in Beijing in 2007 when she was 30 years old, 
becoming one of the youngest-ever female professors in China. She focused on determining 
the structures of proteins embedded in cells’ plasma membranes, which are notoriously dif- 
ficult to solve. 

One of her targets was the human glucose transporter GLUT1 — a protein that is essential 
for supplying energy to cells. Many labs had tried to determine its structure, but the protein 
had defied their efforts, in part because it readily changes its shape. Yan used a series of tricks 
to restrict its troublesome movements and finally managed to make crystals and solve its 
structure in 2014. 

“People tried to crystallize GLUT1 for more than 50 years, and all of a sudden, bingo — she 
hit it” says biochemist Ronald Kaback at the University of California, Los Angeles. 

Yan's hits have kept on coming, with a series of high-profile structures. She stays up most nights 
until 2 or 3 a.m. and skips morning meetings to maximize her time in the lab. Yan has also become 
a high-profile advocate for better conditions for women and young scientists. 

She is excited about using the latest technologies, such as cryo-electron microscopy, which 
for the first time is allowing researchers to study proteins in fine detail in their native environ- 
ments, rather than as purified crystals. Yan says that one of the benefits of working in China is 
she never has to worry about funding and sees a bright future for structural biology there. “The 
sky’s the limit,” she says. = 


A s a girl, Nieng Yan read a classic sixteenth-century Chinese novel featuring a monkey 


WANG YIFANG | PARTICLE POWER 


A leading high-energy physicist hopes to smash records 
witha giant collider. 


BY ELIZABETH GIBNEY 


The director of the Beijing-based Institute of High Energy Physics (IHEP) wants to 

build a 50-100-kilometre circular particle collider to succeed the 27-km-circumference 

Large Hadron Collider (LHC) at CERN, the particle-physics laboratory near Geneva, Switzerland. 
The plan is bold, particularly for a country whose biggest existing collider ring is less than 
250 metres long. Wang’s plan entails building two machines: the first would explore the Higgs 


We Yifang has a plan to catapult China to the forefront of particle-physics research. 
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CAIXIA GAO 


A gene-editing specialist seeks to make her mark by 
improving key agricultural plants. 


BY HEIDI LEDFORD 


p= biologist Caixia Gao was initially 


reluctant to take up gene editing using 

CRISPR-Cas9 — the technique that is 
sweeping through biology laboratories around 
the world. Her lab had already made muta- 
tions in 82 genes using an older technology, 
and the thought of switching to something 
new was daunting. “At first I felt some resist- 
ance,” Gao says. “And then we decided: well 
anyway, we have to try.’ 

After a year of frenzied work, her lab at 
the Chinese Academy of Sciences’ Institute 
of Genetics and Developmental Biology in 
Beijing became the first to use the revolu- 
tionarily simple gene-editing technique in 
crops, specifically wheat and rice (Q. Shan 
et al. Nature Biotechnol. 31, 686-688; 2013). 
“Tf there’s any lesson we learn in genome engi- 
neering, it’s that you have to be very flexible 


and adapt to technology that changes every 
day,” says Daniel Voytas, a plant biologist at 
the University of Minnesota in Saint Paul. 
“Caixia has that ability to adapt.” 

She has been doing that for her whole 
career. Gao went to university planning to go 
into medicine, but was redirected to agricul- 
ture. “Not my interest at all,” she says. “But my 
thinking is always: as long as I am in this posi- 
tion, I will do my best.” After a PhD in grass- 
land ecology, Gao switched again by taking up 
plant genetic engineering at the seed company 
DLF in Roskilde, Denmark. 

Gao had to develop methods for inserting 
foreign genes into grass, which was frustrating 
work, says Klaus Nielsen, research director at 
DLE. Many grasses are difficult to engineer, 
and each species — or even genetic variants 
within a species — may require its own special 
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mix of growth conditions. Gao is famously 
cheerful, but there were days when Nielsen 
could tell that she was seething. 

Even so, she pressed on. “Eventually, she 
could look in the microscope and see things 
no one else could see,” Nielsen says. “She was 
cracking the nut every time.” 

During Gao’s 12 years at DLE, she cracked 
that nut again and again — by genetically alter- 
ing several traits, including the times when key 
grass species flower. But European suspicion 
of genetically engineered crops left her with 
little hope that her work would leave the lab. 
“Tt was so difficult to bring a crop to the mar- 
ket — in the end, the work cannot inspire you 
any more,’ she says. That issue, plus a desire 
to return with her children to her mother 
language and culture, sent her back to China. 

In Beijing, Gao tackled genetic engineering 
in wheat, a crop that is legendary for its dif- 
ficultly to work with, in part because many 
strains have six copies of the genome. Soon she 
was considered one of the best in the world at 
engineering wheat, says Voytas. 

Gao is happy with her decision to return to 
China, where funding for agricultural research 
is a higher priority than it is in Europe, she 
says. The government has approved some 
crops developed with early genetic-engi- 
neering techniques, but such approvals have 
slowed, and China has yet to decide how it will 
regulate gene-edited crops. 

Still, Gao is hopeful that some of her 
creations will reach the market. Meanwhile, a 
disease-resistant wheat engineered in her lab 
is being further developed by a company in the 
United States. Ever the optimist, Gao refuses to 
accept public fears about genetically modified 
organisms (GMOs). “If I meet some people 
in the street and I ask, they will say they don't 
want GMO at all,” she says. “And I stop there 
and educate them. They are so surprised.” = 


boson starting in around 2028; its follow-up would occupy the same tunnel and smash particles 
with up to seven times the energy of the LHC. 

China will have to compete against CERN, which also wants to host a post-LHC machine. 
Although China remains the underdog, Wang's scheme has captured increasing support, says 
Nima Arkani-Hamed, a theoretical physicist at the Institute for Advanced Study in Princeton, 
New Jersey, whom Wang brought on board to lead IHEP’s Centre for Future High Energy Physics 
in 2013. “Nowit’s not purely fantasy. It has a chance of really happening,’ he says. 

Wang says that he only dared to pitch the project because of the success of China’s Daya Bay 
Reactor Neutrino Experiment. He led that multinational collaboration, which beat international 
rivals in 2012 by measuring a parameter that governs transformations in the ghostly particles. 

At more than 250 times the price of Daya Bay, the Chinese mega-collider will bea harder sell. 
China's government has yet to say whether it will foot the facility’s estimated US$6-billion bill. 
Brian Foster, a physicist at the University of Oxford, UK, says that Wang has proved he can get 
major projects off the ground and bring in international support. 

And one of his best attributes is persistence, says Shing-Tung Yau, a mathematician at Harvard 
University in Cambridge, Massachusetts. “He usually succeeds.” = 
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QIAOMEL FU 


A geneticist uses ancient human remains 
to rewrite Asia’s prehistory. 


BY EWEN CALLAWAY 


iaomei Fu says that she was nervous when she arrived at Germany’s Max Planck Institute 

for Evolutionary Anthropology to pursue a PhD on ancient-human genomics, in 2009. 

Her master’s research in China had focused on the diets of early farmers, and she had no 
experience with ancient DNA, or even genetics. But Fu jumped headfirst into her new field and 
“turned out to be one of the most amazing students we've ever had’, says Svante Paabo, a geneticist 
at the hub for ancient genomics in Leipzig. 

With a trio of Nature papers published in the past 20 months, Fu has helped to redraft the history 
of Europe’ earliest modern humans. She returned to China in January to lead an ancient-DNA lab 
at the Institute of Vertebrate Paleontology and Paleoanthropology (IV PP) in Beijing, where she is 
set to bring the same upheaval to Asia’s ancient past. 

She joined Paabo’s team just as it was putting the finishing touches to a draft Neanderthal genome. 
“Tt was really high pressure. There were a lot of really interesting things, and a lot of scary things 
for me,’ says Fu. “I came there at really the right time.” Fu learned how to harvest the scant DNA in 
ancient bones and quickly picked up evolutionary genetics, bioinformatics and computer program- 
ming to analyse the data that she was generating. 

Her focus soon turned to the early modern humans who settled Eurasia after leaving Africa, 
and Fu began collecting and analysing their bones and teeth. She has sequenced the oldest Homo 
sapiens DNA on record: from a 45,000-year-old thigh bone from Siberia and a 40,000-year-old 
jawbone from a man who had a Neanderthal ancestor in the previous 4-6 generations. Her efforts 
— culminating in a study of 51 individuals who lived between 14,000 and 37,000 years ago — have 
shown that Ice Age Europe was more tumultuous than many had thought, with waves of migrants 
moving in and around the continent and contributing to the ancestry of contemporary Europeans. 

Asia's early history may have been even more dramatic than that, because several groups of 
archaic humans probably coexisted with modern humans, says Maria Martinon-Torres, a palaeo- 
anthropologist at University College London who works in China. Fu will turn her attention to the 
first Homo sapiens to settle Asia, who might have arrived more than 100,000 years ago. She also 
hopes to study Asian history as recent as a few thousand years ago — the IVPP hasa vast collection 
of ancient human bones that have yet to be sampled for DNA. 

Fuis often asked why she returned to China instead of staying in the West. “I'm curious what 
happened in China and east Asia,” she responds, “I think it was time to come back” m 
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QIN WEIJIA 


POLAR EXPLORER. When Qin Weijia 
first visited Antarctica in 1989, he fell in love 
with this terra incognita. “It’s a mysterious 
continent, full of unknowns and extremes,” 
says Qin, who is the executive deputy direc- 
tor of the Chinese Arctic and Antarctic 
Administration in Beijing. 

Since then, he has been to the frozen con- 
tinent half a dozen times, including as the 
1996 leader of China’s first inland traverse 
towards Dome A — the highest point and 
one of the least studied regions in Antarctica. 
That was the first of a series of expeditions, 
which culminated in the construction of 
China’s Kunlun station on Dome A in 2009. 

The country is a relative late comer to 
polar research, but the Chinese government 
is investing heavily in both the Arctic and 
Antarctic, driven by the desire for natural 
resources and for a bigger say in interna- 
tional discussions about the regions. 

Last December, an international team 
flew ice-penetrating radar and other sen- 
sors on China’s first fixed-wing aircraft on 
the continent as it traversed back and forth 
across thousands of square kilometres over 
Princess Elizabeth Land in Eastern Antarctica 
to map features under the ice. “It was the first 
survey of its kind in a part of Antarctica we 
know very little about,’ says Martin Siegert, a 
glaciologist at Imperial College London. “The 
results are spectacular.” 

The team discovered the longest canyon 
on Earth and one of the largest areas of melt 
under the ice sheet, says Qin, who led the 
2015-16 expedition. 

Looking forward, he hopes that China will 
be able to retrieve the oldest ice on the planet 
from Dome A, which will help to uncover the 
history of the Antarctic ice sheets and how 
they have changed. “Only then,” says Qin, 
“can we predict how they will respond to a 
changing climate”. m BY JANE Qiu 


CHEN JINING 


The top environment official 
tackles deadly air. 


BY JEFF TOLLEFSON 


( ; hen Jining has a tough job. As head of the 
Ministry of Environmental Protection, 
he is responsible for cleaning up the pol- 

lution that blankets China’s cities, contaminates 

its drinking water and laces its croplands with 
toxic compounds. Although he faces formidable 
odds in one of the most polluted countries in 
the world, Chen has gained the confidence of 
many environmentalists and fellow scientists 
in his first 15 months on the job by stepping up 
efforts to root out corruption and ensure that 
local officials and companies are following rules. 

“Local officials are being held more strictly 
accountable on the environment quality,” says 

Li Yan, who is deputy programme director for 

Greenpeace East Asia and works in Beijing. And 

because Chen's efforts to reduce air pollution 

often reduce carbon emissions as well, Li says 
that the benefits of these reforms extend well 
beyond the affected areas. “This has massive 
global implications.” 

After earning his doctorate at Imperial 

College London in 1993, Chen worked his 


CHAQYANG LU 


way through the ranks of Chinese academia 
to become president of Tsinghua University in 
Beijing in 2012. Nowit looks as though he could 
become the most powerful Chinese environ- 
ment minister in modern times. 

His appointment as head of the environment 
ministry coincided with a new law that 
expanded the agency’s regulatory powers. He 
pushed for additional authority to investigate 
and prosecute polluters, and in May that request 
was granted. This makes it easier for Chen to 
intervene when local officials fail to implement 
many of the government’ policies on pollution 
and development. 

In addition to cracking down on pollution, 
Chen's ministry has worked to strengthen 
environmental assessments and has boosted 
transparency by posting more environmental 
monitoring data on its website, including air- 
quality readings, as well as information about 
its enforcement activities. 

Chen has often shunned contact with the 
media, but fellow scientists say that he has 
been willing to listen to and collaborate with 
outside scientists and international experts 


QUANTUM WIZARD. When Chaoyang Lu was at 
school in a tiny village in Zhejiang province, he fell in love 
with physics. “You could figure out how everything works 


by a few simple equations,” he says. 

Now Lu is a rising star in China’s push to master quantum information technology — which 
could eventually lead to powerful new types of computing and secure communications. The 
33-year-old, a physicist at the University of Science and Technology of China in Hefei, is noted for 
his work with ‘entanglement’, in which the quantum states of different particles are linked regard- 
less of how far apart they are. He has entangled eight photons at once — a world record — and 
has submitted work using ten. Those achievements led Anton Zeilinger, a quantum physicist at the 
Vienna Center for Quantum Science and Technology, to call Lu a “wizard of entangled photons”. 
He has also done groundbreaking work with his mentor, Pan Jian-Wei, in the related phenomenon 
of quantum teleportation, in which a quantum state is transported from one particle to another. 

It was Pan who encouraged Lu to do his PhD work at the University of Cambridge, UK, and who 
convinced him to return to China with the promise that the government is investing heavily in 
quantum information technologies, and that bright young physicists could focus on research rather 
than funding. Lu’s goal is to advance quantum entanglement enough to use it for computations. 
“It will be exciting to see, for the first time, a task where a quantum machine can do a better job 
than a classical one can,” says Lu. m BY M. MITCHELL WALDROP 
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on issues such as air quality. 

“He once said that the history of China’s 
environmental protection is a history of inter- 
national cooperation on environment and 
development,” says Lailai Li, who heads the 
Beijing office of the World Resources Institute, 
on whose board Chen once sat. 

The minister still faces huge challenges, 
however. Citizens are increasingly demanding 
that the government clean up the environment, 
but China's rapid industrial rise has created a 
backlog of problems. Cleaning up the air in 
major cities may be the easiest task facing Chen; 
regulators are only beginning to grasp the extent 
of the water and soil contamination. 

And government authorities continue to 
approve industrial projects, even when the envi- 
ronmental costs are all too clear, says Dasheng 
Liu, an environmental engineer and research 
fellow at the Shandong Institute of Environ- 
mental Science in Jinan. “He has more power 
than before,” Liu says, but he also faces “more 
arduous and heavy responsibilities.” m= 


Additional reporting by David Cyranoski. 


THE SEQUENCING SUPERPOWER 


First China conquered DNA sequencing. 
Now it wants to dominate precision medicine too. 


ix years ago, China became the global 

leader in DNA sequencing — and it was all 

down to one company, BGI. The Shenzen- 

based firm had just purchased 128 of the 
world’s fastest sequencing machines and was 
said to have more than half the world’s capacity 
for decoding DNA. It was assembling an army 
of upstart young bioinformaticians, collabo- 
rating with leading researchers worldwide and 
publishing the sequences of creatures rang- 
ing from ancient humans to the giant panda. 
The firm was quickly gaining a reputation as 
a brute-force genome factory — more brawn 
than brains, said some. 

Six years later, the scene is quite different. 
BGI’s most famous scientist and visionary 
leader, Jun Wang, left last July. The machine 
that had given the company its dominance is 
outdated, and the firm’s attempt to develop its 
own industrial-scale whole-genome sequencer 
hit a roadblock last November, forcing it to lay 
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off employees at its US subsidiary. Meanwhile, 
the competing system — Illumina’s X series — 
has been selling briskly, raising the speed and 
dropping the price of sequencing worldwide. 
Armed with the latest sequencers, rival 
companies to BGI have emerged. Most 
prominent of these is Novogene in Beijing, 
founded in 2011 by former BGI vice-presi- 
dent Ruigiang Li. And although BGI might 
not have the uncontested dominance it once 
did, it still claims to have the world’s largest 
sequencing capacity as well as major scien- 
tific ambitions — including to sequence the 
genomes of one million people, one million 
plants and animals and one million microbial 
ecosystems. Today, China is being reborn as a 
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sequencing power with a broader base. 

Fuelling the drive is a multibillion-dollar, 
15-year precision-medicine initiative, which 
China announced in March and which rivals 
a similar initiative in the United States. If these 
efforts fulfil their goals, doctors envision being 
able to use a person’s genome and physiology 
to pick the best treatments for his or her dis- 
ease. The goal now for sequencing companies 
is to turn the bounty of genomic data into 
medical benefits. 

To do that, sequence data alone are not 
enough — so some Chinese companies are 
going beyond brute-force sequencing to work 
out how lifestyle factors such as diet are also 
important for understanding disease risk and 
for finding therapies. “The thing about China is 
the ambition they have for their precision-med- 
icine programme is orders of magnitude larger 
than the United States? says Hannes Smarason, 
chief operating officer and co-founder of 


SIM CHI YIN/VII 


Scientists at WuXiNextCODE, 
sequencing giant a genomics com- 
BGI in Shenzhen are pany in Cambridge, 


Massachusetts, that 
is part of Shanghai- 
based WuXi AppTec. 
“They are dynamic 
and receptive. There, the idea of integrating of 
genomics into health care is very real” 


looking to apply their 
genetic expertise to 
medicine. 


RISE OF THE MACHINE 

The new energy behind sequencing is largely 
thanks to one machine: Illumina’s HiSeq X 
Ten, so called because it is generally sold as 
sets of ten units. When the machine hit the 
market in 2014, one set was able to sequence 
a human genome for close to US$1,000, and 
power through some 18,000 human genomes 
per year. Companies that wanted to rival BGI 
saw an opportunity — and leapt. 

Novogene was the first. Following a model 
similar to BGs, Li has been building up a large 
staff of bioinformaticians to generate and inter- 
pret sequence data as part of collaborative basic- 
research projects on the snub-nosed monkey 
(Rhinopithecus roxellana)’, cotton (Gossypium 
hirsutum) and other plants and animals. Using 
the same machine, a handful of other compa- 
nies — including WuXi PharmaTech and Cloud 
Health, both in Shanghai — focus more on 
offering sequencing as a service to pharmaceu- 
tical or personal-genomics companies. 

The growth is accelerating. Novogene added 
a second X Ten set in April, and Cloud Health 
chief executive Jason Gang Jin says that the 
company will add another two sets this year. By 
the end of the year, China will probably have 
at least 70 units. (Illumina says that 300 units 
were sold worldwide by the end of last year.) 

BGI has been trying to keep pace. In 2013, 
it purchased Complete Genomics in Moun- 
tain View, California, in a bid to create its own 
advanced sequencing machines for in-house 
use and for sale. The firm announced a sys- 
tem called Revolocity, its attempt to match the 
HiSeq X, last June. But in November, having 
taken just three orders, it suddenly suspended 
sales. BGI is now left with its ageing fleet of 
128 Illumina HiSeq 2000 machines and a 
mélange of newer sequencers from various 
companies, including its own. 

Estimates of China's share of the world’s 
sequencing-capacity range from 20% to 
30% — still lower than when BGI was in its 
heyday, but expected to increase fast. “Sequenc- 
ing capacity is rising rapidly everywhere, but 
it’s rising more rapidly in China than anywhere 
else,” says Richard Daly, chief executive of 
DNAnexus in Mountain View, which supplies 
cloud platforms for large-scale genomics. 

BGI has another machine up its sleeve. The 
BGISEQ-500 is designed as more of a desktop 
instrument for research labs. It is also based on 
the Complete Genomics technology and is set 
to begin shipping this year. Yiwu He, BGI's new 
global head of research, says that the system 


can sequence a human genome for $1,000, and 
by being smaller in scale and more flexible to 
use, it will meet China’s emerging need for clin- 
ical sequencing. “There will be more sequenc- 
ing done outside of research institutes, in the 
hospitals,’ says He. The company will bring the 
price of one human genome sequence down to 
$200 in the next few years, he predicts boldly. 
“China is the most exciting place to do bio- 
medical research.” 


GENOMES EN MASSE 
The announcement of the precision-medicine 
programme sent a ripple of excitement through 
China's sequencing giants. The money will be 
spent on improving technologies, sequencing, 
and sharing and analysing more than one mil- 
lion human genomes, as well as on developing 
drugs and diagnostics from the data and using 
those findings to personalize medical care. 
Hungry for a share of the cash, hospitals and 
clinicians are teaming up with sequencing com- 
panies to come up with proposals for the work. 
The one million human genomes will 
be split among a variety of studies, and will 
include groups of 50,000 people who each have 
metabolic disease, breast cancer, gut cancer or 
another condition. There will also be cohorts 
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else in the world, predicts He. “We can get 
there faster because of our partnership with 
the government, hospitals, universities, 
because we can move faster than large con- 
sortia, and particularly because we have our 
own sequencer. That is a huge advantage,” he 
says. 

But making sense of one million human 
genomes is a major challenge, says Wang, who 
quit BGI to found a company called iCar- 
bonX in Shenzhen. The firm plans to collect 
sequencing data for more than a million people 
“as a start’, as well as other biological informa- 
tion, including changes in levels of proteins 
and metabolites, brain imaging, biosensors to 
monitor things such as glucose levels and even 
the use of smart toilets that will allow real-time 
monitoring of urine and faeces. Wang calls it a 
“digital form of you”. He plans to use artificial 
intelligence to integrate all the data, with the 
ultimate aim of providing medical care that is 
tuned to an individual's genes and physiologi- 
cal state. Less than a year in, Wang has raised 
more than $100 million, including a big chunk 
from Shenzen-based Tencent, the company 
behind the social-media application WeChat, 
which Wang says will help to build the data- 
collection platform. 


“SEQUENCING CAPACITY 1S RISING RAPIDLY EVERYWHERE, BUT 
IT’S RISING MORE RAPIDLY IN CHINA THAN ANYWHERE ELSE.” 


that represent northern, central and southern 
China, “to look at the different genetic back- 
grounds of subpopulations’, says Li. 

Similar projects exist elsewhere, including 
one in the United Kingdom that is sequenc- 
ing 100,000 genomes, and one in the United 
States that has a budget of $215 million and 
aims to cover one million genomes. But China 
will have some advantages, observers say, not 
least of which is firm backing from the gov- 
ernment. Over the next five years, the govern- 
ment has promised to add several precision 
drugs and molecular-diagnosis products to 
the national medical-insurance list, ensuring 
that companies’ research costs will be recouped 
if they lead to such a product. In the United 
States, biotech companies with new products 
can struggle to get insurance companies or the 
government to pay. “There is greater accept- 
ance of sequencing and willingness to invest 
in itin China,’ says Daly. 

In September, BGI will open the China 
National Genebank, a five-hectare facility in 
Shenzhen that will house millions of samples 
from people, animals, plants and microbes. 
Entrusted to BGI by the central government, 
the bank will make some samples and data 
available to researchers around the world. 
And the company is compiling its own data- 
base of one million human genomes, which 
will overlap to some degree with the national 
project. BGI will hit the target before anyone 
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China is already exploring how else 
genomics can benefit health. In March, BGI 
celebrated its one millionth NIFTY test — a 
screen that sequences fetal DNA circulating 
in the mother’s blood to detect chromosomal 
abnormalities such as Down’s syndrome’. 
The country’s conversion from a one-child 
to two-child policy is expected to accelerate 
the demand for such tests. Cancer genetics is 
also well on its way. Cloud Health last year fed 
genomic data from some 15,000 tumour sam- 
ples to more than 100 genetics companies in 
China to help with diagnosis and make sure 
that patients get the right chemotherapy drugs. 
The market for pricey genomic tests is growing 
in step with the country’s middle class. “China 
has 100 million people making more than 
$50,000 now,’ says Daly. 

For Wang, sequencing on its own is old hat. 
“Genomics is important, but it’s just one piece 
of the puzzle,’ he says. “All the complex traits. 
All the neurodegenerative disorders, cancer, 
diabetes — it’s all more than genetics. If we 
only talk about genomics, about massive data 
without clinical info, that’s not enough” = 


David Cyranoski writes for Nature from 
Shanghai, China. 
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Baby boom: following the end of China’s one-child policy, the private fertility sector needs close scrutiny. 


No wild east 


China has lessons for the world when it comes to 
overseeing ethically sensitive research in the life 
sciences, argue Douglas Sipp and Duangqing Pei. 
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Te first and only published papers 
to describe genome modification 
in human embryos have come from 
Chinese laboratories’”. For some, this is 
another signal of China’s successful trans- 
formation from a closed society focused 
on farming and the manufacturing of com- 
modities to a world leader in innovation. 
For others, these studies are the latest in a 
list of feats reported over the past decade 
that reflect the country’s lax regulation or 
cultural indifference to fundamental ethical 
tensions. 

In our view, fears that China’s scientific 
ambitions are overwhelming its ability 
to exercise appropriate caution in the life 
sciences — particularly in research involv- 
ing human embryos — are overblown. In 
fact, China has shown care and restraint with 
respect to altering the genomes of human 
eggs, sperm or embryos, and in the use of 
human embryos in research more broadly. 

Major challenges lie ahead, particularly in 
the commercial application of biotechnol- 
ogy. But as international standards evolve to 
keep pace with rapid advances in research, 
China should be encouraged to take its 
place as a fellow pioneer alongside longer- 
established research superpowers — both in 
the laboratory and in regulation. 


TRIAL BY MEDIA 

The first study to report the modification 
of the genomes of human embryos was 
rejected by Nature and by Science, report- 
edly in part because of peer reviewers  ethi- 
cal apprehensions. And media accounts 
of both papers in the United States and 
Europe often depicted the work as the pur- 
suit of progress unchecked by principle (see 
go.nature.com/1lukutpw and go.nature. 
com/1tmicqx). 

It is not unusual for research led by labs 
in China to be cast in such a light. In the 
early 2000s, Chinese investigators trans- 
ferred’ the nuclei of human skin cells to 
cultured rabbit egg cells in an attempt to 
produce humanized stem cells. Like the 
first gene-editing paper, that study was ini- 
tially rejected partly because of ethical > 
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> concerns and received intense media 
scrutiny when published. More recently, 
BGI, a private genomics company in Shen- 
zhen, sought to associate particular genetic 
sequences with intelligence by sequencing 
the genomes of volunteers with ‘high cog- 
nitive ability. The study, which began in 
2012, provoked concern about a supposed 
interest in eugenics in China despite assur- 
ances from the institute that the aim was 
purely to further basic understanding of 
the genetic basis of high IQ and that, in 
any case, reproductive applications would 
not be allowed under 


existing guidelines. “China’s 
Studies such as approachhas 
these would rightly arguably been 
provoke discussion more effective 
wherever they were and enabling 
conducted. But all fthanthe legal 
too often the intima- patchwork 
tion is that Chinese seeninmuch 


scientists are free to 
do anything and are 
a step away from making designer babies. 
What is more, commentators, both in China 
and outside it, often assume that scientists 
and others in China have little concern 
about the fate of early human embryos. 

Even a cursory review of China's existing 
regulations, as well as its research and social 
norms, shows that this picture is fundamen- 
tally inaccurate. 


of the world.” 


PRINCIPLED PROGRESS 

National guidelines on assisted reproduc- 
tion and embryonic-stem-cell research*” 
have precluded the implantation of modi- 
fied human embryos for reproductive pur- 
poses since 2001. In China, going against 
government guidelines can incur financial 
penalties and loss of employment as well as 
loss of funding and licences to do research. 
Thus, although not encoded in law, the 
ruling that research on human embryos is 
permitted, but that the transfer of modified 
embryos to a woman's uterus is not, has 
been described as a “Rubicon” for China’s 
research and medical communities’. It has 
not been breached in the 15 years since the 
2001 guidelines were written. 

Both of the labs that described genome 
editing in human embryos’ obtained 
approval for their studies from institutional 
review boards. They also used non-viable 
one-cell embryos that had been discarded 
by in vitro fertilization (IVF) clinics and 
that were incapable of developing to term. 
Furthermore, they discontinued their work 
on discovering that the genome-editing 
process was unexpectedly inefficient. None 
of these steps suggests a cavalier approach 
to research involving human embryos. 

There is also little evidence for a noncha- 
lant attitude among Chinese citizens towards 
the use of human embryos in research. 
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Contrary to common perceptions, fami- 
lies using IVF in China have generally been 
conservative in their handling of surplus 
embryos. Even under the one-child policy (a 
public policy that was lifted only in January 
this year), 83% of surveyed Chinese families 
using IVF opted to keep surplus embryos in 
storage for 0-3 years after the birth of their 
baby’, with many citing feelings of attach- 
ment to the embryos as their reason (see 
“To store or not to store’). Around 63% of 
families similarly surveyed in the United 
States chose to keep embryos in storage for 
0-5 years after having a baby using IVF*. 
Getting a handle on China’s complex 
regulatory systems can be daunting for 
non-Chinese speakers and, when it comes 
to implementation, sometimes even for the 
domestic community. Indeed, China would 
almost certainly earn more international 
regard if it made more effort to publicize its 
regulatory framework. Many of the nation’s 
guidelines are buried in hard-to-navigate 
agency websites, and official English trans- 
lations are scarce, making informed dis- 
cussion by foreign scholars difficult. Yet, 
in relation to the use of human embryos 
in research, China’s approach has arguably 
been more effective and enabling than the 
legal patchwork seen in much of the world. 
For years, stem-cell researchers in the 
United States have faced uncertainty over 
the future of the field and over whether a 
given cell line would remain usable in fed- 
erally funded work. That confusion has 
been compounded by differences between 


TO STORE OR NOT TO STORE 


Families in China and the United States who 
have had babies using in vitro fertilization (IVF) 
make similar choices about whether to 
continue to store frozen surplus embryos. 
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states; currently, the use of embryonic stem 
cells may be legal in one state but a crime 
just across the state line. Other countries, 
including Australia, Brazil and Japan, have 
similarly struggled to develop embryo- 
research policies’, leaving scientists in 
limbo for many years. 

The binding force of government guide- 
lines, combined with China's consistent posi- 
tion — prohibiting uses in reproduction but 
permitting those in research — has given sci- 
entists the confidence to pursue studies in a 
well-defined ‘safe space’ Indeed the signifi- 
cant gains that China has made, particularly 
in relation to gene editing’*”, are thanks in 
part to this clarity. 


PRIVATE SECTOR 

A major question now is whether ministerial 
guidelines will suffice in China's nascent pri- 
vate biotech sector. 

In 2009, 2012 and 2015, the Ministry of 
Health (MOH) and the China Food and 
Drug Administration (formerly the SFDA) 
introduced guidelines for stem-cell thera- 
pies. Early attempts to rein in private clin- 
ics ran into difficulty, in part due to the 
diversity of regulatory jurisdictions across 
regions and cities, and inconsistencies in 
compliance. Wherever there are ambigui- 
ties, enforcement can become challeng- 
ing. Stem-cell biology is a young science 
and its clinical application is a therapeutic 
frontier, and so both regulators and the 
regulated lack experience. Indeed, the pro- 
cess of ensuring that guidelines are being 
followed has often become a matter of 
discretion for individual government agen- 
cies — a situation that is ripe for abuse. 

The National Health and Family Plan- 
ning Commission and China Food and 
Drug Administration this year established 
a panel of experts to evaluate centres that 
seek to perform stem-cell-based clinical 
trials. Guidelines published by these bod- 
ies state that any organization wanting to 
pursue such trials should first undergo this 
evaluation (see go.nature.com/ltvtjcw; in 
Chinese). 

The death in April of a cancer patient 
who had received an ineffective cell therapy 
caused an outcry on social media in China, 
which prompted a government order to hos- 
pitals not to outsource medical services. The 
new evaluation procedures, combined with 
an increasingly aware public acting as watch- 
dog, should make it harder for providers to 
market unapproved treatments. 

Another concern is that rising demand 
for reproductive medicine following the 
relaxation of the one-child policy may 
expand markets for IVF and other services 
(see go.nature.com/luh4rep). Currently, all 
IVF clinics require a licence in China. Buta 
potentially more profitable private sector in 
this area warrants close scrutiny. New laws 
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or guidelines may be needed to check the 
marketing of certain services. 

Many countries face the challenge of 
developing effective policies that both 
respect the ethical standpoint of diverse 
publics and enable the exploration and 
application of biomedical technologies. 
China should be given an equal voice 
in the global discussion about how best 
to achieve this. Encouragingly, more 
dialogue is starting to happen. In an 
unprecedented move, the Chinese Acad- 
emy of Sciences joined the US and UK 
scientific academies in organizing the 
first international summit on human 
gene editing last year. 

Establishing appropriate governance 
for research in the life sciences is hard for 
everyone given globalization, the pace of 
technological advances, the complexity 
of domestic regulatory ecosystems and a 
growing international movement to make 
deregulated markets — not government 
officials or bioethicists — the arbiters of 
quality and ethicality. We must therefore 
strive for a better understanding on all 
sides of the efforts that different coun- 
tries are making, and of how they can 
work together to develop a consensus 
on international governance. Good rules 
drive good science. m 
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The Five-hundred-meter Aperture Spherical Telescope in Guizhou is due to be completed in September. 


Boost basic 
research in China 


Improving the quality, integrity and applicability of 
scientific research will underpin long-term economic 
growth, writes Wei Yang. 


( "Dew economy relies on innovation. 
Developing technologies, improving 
efficiency and creating and imple- 

menting new scientific knowledge can 
invigorate industry and help society. Chi- 
na’s recent economic slowdown, however, 
calls for a gear change in how the nation 
innovates. 

For several decades, short-term and 
focused technological research and develop- 
ment (R&D) has been the main driver in 
China. Large public grants were channelled 
to promising or urgent areas to deliver new 
turbine engines, high-speed trains, solar 
panels or drugs in 5-10 years. Now China 
must take a longer and broader view, and 
nurture its science roots. 

Basic research — studies that create sci- 
entific knowledge and technologies that can 
be subsequently developed, translated or 
applied — has a conflicted image in China. 
Progress has been enormous (see Nature 
481, 420; 2012): China’s share of research 
papers worldwide (as counted in Elsevier's 
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Scopus database) grew from 2.5% in 1997 to 
18.8% in 2015 — but severe criticisms persist 
(see ref. 1 and Nature 463, 142-143; 2010). 
For example, critics say that China's uni- 
versities have become paper mills induced 
by metrics that value quantity over quality. 
Impact remains low: few chemical reactions 
or processes are named after Chinese schol- 
ars, even though the nation now publishes 
more papers in chemistry than any other. 
Research misconduct — including ghost- 
writing and reviewing — has been rife, as 
evidenced by retractions of papers by Chi- 
nese authors from BioMed Central, Elsevier 
and Springer journals in the past two years. 
Industrialists and some government offic- 
ers complain that many academic studies, 
such as in pure mathematics or fundamental 
physics, are irrelevant to the nation’s economy 
or society. Scientific and technological pro- 
gress contributed to only 55% of economic 
growth in China in 2015, compared with 
88% in the United States in the same period. 
And China spends relatively little of its total 
R&D budget (public, industrial and private) 
on basic research — just 4.7% in China com- 
pared with 24.1% in France, 17.6% in the 
United States and 12.6% in Japan in 2013. 
Improving the quality and integrity > 
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> of basic research must be the focus of 
national efforts to boost innovation in 
China. Quality needs to matter more than 
quantity, and integrity is the best way to 
ensure quality. Applicability to techno- 
logical development justifies drawing more 
resources into basic research. As the head of 
the National Natural Science Foundation 
of China (NSFC), a leading national gov- 
ernment funding agency for basic-science 
research, I call for a sustained focus to bring 
about such a change by 2020. 


EMERGING POWERHOUSE 

China is rising rapidly up the global scien- 
tific ranks by every measure — quantity and 
quality of output, R&D spend and increased 
collaboration (see page 452). For example, 
China's share of high-impact works (the top 
0.1% of papers in Scopus rated by citations) 
has grown, from less 


than 1% in 1997 to “Raising the 
about 20% now. bar on quality 

Founded in 1986 must bethe 
with a starting budget top priority.” 
of just 80 million yuan 


(US$12.2 million), the NSFC has expanded 
more than 300-fold to allocate 24.8 billion 
yuan in 2016. It funded 62.1% of Chinese 
research papers, or 11.5% of global aca- 
demic output, in 2015. The foundation's 
mission is to bea ‘FRIEND of scientists: fair 
in reviews; rewarding in fostering research; 
international in global participation; effi- 
cient in management; numerous in grants; 
and diversified in disciplinary coverage. 

But beyond the buoyant statistics, basic 
research in China has been slow to develop. 
For example, there is only one science Nobel 
laureate from mainland China. And the 
nation’s research lags behind other countries 
in terms of citations — its Field Weighted 
Citation Impact measure was 0.86 in 2015, 
below the world average of 1.0. 

Raising the bar on quality — higher cita- 
tions and more major breakthroughs — must 
be the top priority. Put another way, China 
needs to raise the altitude of its basic research 
landscape and form high mountains. 

Agreed metrics are needed to track pro- 
gress. Current measures are heterogeneous 
and do not work equally well across China's 
vast and diverse academic landscape. The 
country has 1,000 research institutions capa- 
ble of basic research, each with a different 
focus, and more than 1,000 universities, each 
with a different blend of research and teach- 
ing. For example, Tsinghua University in 
Beijing receives nearly 5 billion yuan in 
annual research grants from all sources, 
whereas some regional colleges have research 
budgets of only a few million yuan per year. 
Measuring publication numbers might work 
well for a young institute that publishes 
ten papers a year in relevant international 
journals, but may eventually distort the 
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disciplinary mix ofa large university that pub- 
lishes 10,000 papers a year in diverse journals. 

So, in practice, each institute must decide 
for itself what is most important to track. It 
might choose to look at whether a project 
or person is producing many publications, 
whether the work has high impact or is 
highly cited, if a project is globally signifi- 
cant or is a major scientific breakthrough. 
Each institute must plot a trajectory that is 
consistent with its history, current status and 
future goal. Evaluation needs must be reas- 
sessed as a project matures, and as the insti- 
tute upgrades. In most cases, when institutes 
are managing their progress healthily, this 
‘soft’ approach will work. Interventions such 
as campaigns to reward high-quality work 
might be needed for those that deviate from 
the research commonwealth. 

Universities need to implement metrics 
wisely and clarify their aims (see D. Hicks 
et al. Nature 520, 429-431; 2015). The Chi- 
nese Academy of Sciences (CAS) has taken 
a lead: some 15 years ago, it was the first in 
China to include citation in its assessment 
metrics, leading to an exponential growth in 
high-impact works. And three years ago, CAS 
directed each of its 104 research institutes to 
concentrate on one mission, three near-future 
breakthroughs and five long-term directions. 

But setting targets that are too rigid can 
skew or hinder research. More institutes 
recognize that emphasizing publication 
numbers pressures researchers to write lots 
of incremental papers rather than a few good 
ones. Merit-based academic evaluations — 
that account for international recognition, 
representative works and impacts to the field 
— can avoid this. Long-term development, 
which may be slow but steady, must be dis- 
tinguished from short-term gains that lack 
sustainability. Many universities and research 
institutes are downsizing the proportion of 
researchers salaries that are based on perfor- 
mance (from more than 70% to less than 30% 


MISCONDUCT ALLEGATIONS FALL 


China is on a long march to research integrity, 
as shown by data from funding agency NSFC. 
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in extreme cases), so that a higher percentage 
goes towards rewarding merit. 

Another question is how best to apportion 
basic research funds. Should science address 
societal ‘grand challenges’ or test bold con- 
cepts? Should resources be pooled or shared 
among many individuals? One answer is to 
cover several bases. For example, the NSFC 
invests 70% of its funding in blue-skies 
research, 10% on supporting talent and 20% 
on major research projects for scientific chal- 
lenges and new research facilities. 

Later this year, the ministry of education 
will launch a blueprint for a ‘double excel- 
lence initiative’ to drive China’s universities 
and academic programmes towards world- 
class standards, such as by assembling high- 
quality teams. It is likely that the evaluation 
for universities will change to reward the 
achievement ofa few top-quality departments 
rather than many average ones — with similar 
goals to the UK Research Excellence Frame- 
work. Many universities are adjusting their 
academic structures and realigning leading 
researchers in anticipation. 


REINFORCE INTEGRITY 

China is enduring a long march to research 
integrity’. The United States tops the league 
table of retracted papers (see retractionwatch. 
com), partly owing to its formidable quantity 
of scientific publications, but retractions from 
China are growing. The countries have taken 
slightly different educational approaches to 
reinforcing integrity’. In China, research 
misconduct tends to be portrayed in black- 
and-white terms — scientists are either on 
the moral high ground or cast into the ethical 
abyss. In the United States, educators analyse 
grey areas by discussing case studies with 
early-career researchers in class. Both coun- 
tries can learn from each other. 

Research misconduct in China is driven by 
several forces”. These include competition 
(owing to the rapid expansion of researcher 
numbers) as well as assessment criteria — 
the need to publish in international journals 
encourages the use of language services or 
ghostwriters, and quantification encourages 
research outputs to be split up and published 
separately (known as salami-slicing). Other 
drivers include strengthening of ethical values 
such as animal rights, and insufficient provi- 
sion of ethical codes in areas such as genetics 
and big data. 

For the past decade, the China Association 
for Science and Technology, the education 
ministry and the NSFC have run a well- 
publicized anti-misconduct campaign in 
the Chinese scientific community. It is hav- 
ing results (see “Misconduct allegations 
fall’). Most research institutions now have 
procedures in place to tackle suspected or 
confirmed ethical breaches, and a zero- 
tolerance policy has been enforced in some, 
such as Zhejiang University in Hangzhou (see 
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China’s DAMPE satellite launched in December 2015; others are scheduled to follow this year. 


Nature 481, 134-136; 2012). The proportion 
of allegations of misconduct is declining 
even though more attention is being given to 
actively detecting cases. Similarities between 
submitted proposals and published disserta- 
tions are also going down. The culture of new 
researchers is changing from ‘why not cheat’ 
to ‘it is not worth getting caught’ 

To go further requires changes on three 
fronts: attitude, structure and methodology. 

A change in attitude — from covering up 
misconduct to exposing it — is essential. The 
NSFC is implementing a similarity check 
for submitted grant proposals and now 
publishes an annual press release detailing 
notorious misconduct cases. It is also inves- 
tigating cases of ghostwriting and reviewing. 
Since 2000, we have evolved our policies on 
information handling, from guarding review 
confidentiality to transparency in research 
evaluation. Each panel is required to moni- 
tor the healthy conduct of review in its disci- 
pline, by voting on the fairness of their fellow 
panellists’ judgements, for example. 

Structural changes within institutions 
are crucial to separate administrative and 
academic powers and prevent corruption. 
For instance, the NSFC has exercised vari- 
ous practices that might be applied in other 
funding agencies. Agency administrators 
are no longer involved in academic reviews. 
NSFC staff members are only authorized to 
access information that is relevant to their 
duties, and an independent council of senior 
academics has been set up to counterbalance 
the administrators. In many institutions, 
external advisers are now used to avoid con- 
flicts of interest, academic committees are 
being given more power, and committees 
have been formed to safeguard research and 
clinical ethics. 

Methodology changes can remove the 
soil that nourishes research misconduct. A 
nationwide campaign against overly quanti- 
fied measures of research is under way. Caps 


on human-resource costs are being lifted. A 
streamlined funding architecture needs to 
be achieved, which reduces fragmentation 
in grant sources and mandates that all grant 
reviews are conducted by professional institu- 
tions selected by a joint committee rather than 
by administrators. We also need to use more 
external reviews by international peers, and 
account for indirect research costs. 


PRIORITY TOPICS 
Which areas of basic science look most 
promising to develop in the next five years? 

The NSFC’s plan for 2016-20 includes a 
list of areas, breakthroughs and interdisci- 
plinary hotspots in which China could 
deliver fast. Examples are the ‘Langlands 
programme’ for mathematics that links 
number theory, geometry, analysis and 
theoretical physics, such as at the Academy 
of Mathematics and Systems Science, CAS; 
and the deep underground Earth-physics 
laboratory near Jinping, Sichuan, that might 
detect dark matter. There is also the Five- 
hundred-meter Aperture Spherical Tele- 
scope (FAST) in Guizhou, southwest China, 
which is due to be completed in September; 
and 24 scientific satellites planned for the 
next 5 years (4 of which are due to launch 
this year) that will advance astrophysics, 
cosmology and Earth sciences. 

Other promising areas and institutes 
include molecular chemistry and quantum 
catalysis for chemistry, which is a focus of the 
Dalian Institute of Chemical Physics at CAS; 
quantum computing for information science 
at the University of Science and Technology 
of China in Hefei; and neural circuits and 
brain science in Shanghai’s biomedical sci- 
ences and innovation complex. The National 
Center for Protein Sciences (the PHOENIX 
Center) in Beijing and Shanghai is focusing 
on proteomics; and teams are working on 
gene editing, molecular approaches to cancer, 
and infectious diseases. A multidisciplinary 
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effort is needed to stimulate the country’s 
‘green science’ — Earth, ocean and environ- 
mental science. 


MUCH TO DO 

For basic science innovations to benefit the 
economy, the full chain of development — 
from the initial research to technology, prod- 
ucts and the market — must be nurtured. Not 
all research will bear fruit beyond the lab; 
some is curiosity-driven. But where possi- 
ble, new knowledge should either be turned 
into technology or translated from one field 
to others. 

The NSFC is feeding the source. The rest 
of the chain is being encouraged by the Min- 
istry of Science and Technology, through its 
major National Initiatives for Technology 
and Engineering (16 of which run to 2020 
and 15 that extend to 2030) and National Key 
Research Projects (36 launched this year). 
These programmes link researchers, develop- 
ers and venture capitalists. Examples include 
addressing air pollution, increasing the use of 
low-carbon energy in chemical engineering 
and deep-sea stations for ocean exploration. 

Barriers between research and commercial 
development are being dismantled by new 
policies. These include the recent revisions 
of knowledge-transfer laws, which assign 
the benefits of public-funded projects to the 
researchers and their institutions (similar to 
the US Bayh-Dole Act). Researchers thus 
gain incentives of fame and wealth. 

In summary, four issues need attention. 
First we must incentivize, not discourage, 
Chinese scientists for making big scientific 
breakthroughs. These take time and endur- 
ance, as the recent detection of gravitational 
waves illustrates. Areas such as basic physics 
and astronomy need master plans for long- 
term development. 

Second, we must develop and adopt an 
assessment strategy using appropriate met- 
rics for evaluating merit. 

Third, we must create a healthy, congenial 
academic ecology. We should let researchers 
spend time on research, rather than overload 
them with paperwork or leave them to fend 
off allegations and slog over grant finances. 

Finally, we must devise a business model 
for China that identifies and cultivates appli- 
cable research findings. There are many miles 
to go before we rest. = 
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BOOKS & ARTS 


John Maynard Keynes (right, with Henry Morgenthau) laid the fo 


undations for the GDP metric. 


GDP in the dock 


Diane Coyle savours a history of the long-standing 
economic measure and possible alternatives. 


World War as a thermometer of eco- 

nomic health, gross domestic product 
(GDP) has become a familiar incantation in 
claims and counter-claims about the well- 
being of nations. Some environmentalists 
and feminists were early critics, but until 
recent decades, few others questioned it. 


Se its invention during the Second 
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Now, campaigners ranging from left-wing 
Nobel-prizewinning economist Joseph 
Stiglitz to the free-market Economist 
magazine want to replace GDP with direct 
measurement of human well-being. The 
technology industry has joined them, 
bemoaning the failure of GDP to account 
properly for digital technologies, including 
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—— : free online services, 
because the relevant 
statistics are not col- 
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books about economic 
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The Great coalition is shaping 
Invention: The up in favour of moy- 
StoryofGDPand ing away from GDP. 
the Making (and The question is what 
AMagey So) re to use instead 
Modern World : 


In The Great Inven- 
tion, Ehsan Masood, 
editor of policy peri- 
odicals Research Europe and Research Fort- 
night, argues for an improved GDP. Into this 
single metric for economic-activity indica- 
tors — defined as the monetary value of all 
goods and services produced in a country — 
he would combine environmental impacts 
and human well-being. His book traces the 
history of GDP since its creation, as well 
as the calls for alternatives, mainly from 
environmentalists. Masood agrees with the 
sentiment of suggestions to use ‘dashboards’ 
that incorporate other economic data and 
supplementary indicators, but he concludes 
that GDP matters. As he writes of countries 
that adopted it: “The act of measuring their 
economies would ultimately determine how 
their economies would be managed.” And it 
matters despite, or because of, its flaws. GDP 
is too entrenched to be successfully replaced, 
he finds; instead, it needs radical reform. 


EHSAN MASOOD 
Pegasus: 2016. 


FORMATIVE FACTORS 

GDP began, as Masood notes, as an aggre- 
gate measure when the need arose for 
governments to manage economies during 
the Depression in the 1930s and the Sec- 
ond World War. Pioneers of the statistics 
involved, such as US economist and Nobel 
laureate Simon Kuznets, intended to create 
a metric to meaningfully capture a society's 
economic welfare. 

There were other formative factors at 
work. One was the need to avoid suggest- 
ing that the war effort was reducing welfare. 
Another was the thinking of influential 
British economist John Maynard Keynes, 
as set out in his 1936 The General Theory 
of Employment, Interest and Money. Keynes 
theorized that raising aggregate demand 
or total spending in the economy through 
government expenditure can avoid the sort 
of mass unemployment that was seen in 
the Depression by stimulating growth and 
improving stability. 
He and his supporters 
were determined to 
make the new metric 
serve that govern- 
ment role by defining 
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federal spending as a key component of the 
equation, along with consumer spending 
and investment. Thus, GDP was born asa 
transatlantic effort, led by Keynes's assis- 
tants in the UK treasury, Richard Stone and 
James Meade. By the end of the 1940s, it was 
standardized through the United Nations, 
and the same international process is in 
place today. 

Masood covers decades of challenges to 
GDP conventions that make for a fascinat- 
ing institutional and human story. Those 
seeking an alternative included UN official 
Maurice Strong, a key figure in the 1972 UN 
Conference on the Human Environment 
and the 1992 Earth Summit. Other critics 
were Italian industrialist Aurelio Peccei and 
British civil servant Alexander King, who 
together established think tank the Club 
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that GDP doesn’t 
take into account how natural assets are 
depleted to generate current income 
and consumption. Several proposals for 
new models have underlined the need to 
account for the environment; Masood (an 
erstwhile Nature journalist) praises a 1997 
paper on these proposals co-authored by 
economist Robert Costanza (R. Costanza 
et al. Nature 387, 253-260; 1997). Econ- 
omists James Tobin and William > 


Amartya Sen created the UN Human 
Development Index with Mahbub UI Haq. 


Books in brief 


Code Warriors 

Stephen Budiansky KNOPF (2016) 

Code fiend and writer Stephen Budiansky’s history of the US National 
Security Agency (NSA) and its intelligence battles with the Soviet Union 
opens in 2013, as whistle-blower Edward Snowden enacts his long- 
planned exposé. In a narrative laced with cryptanalysis, Budiansky 
then tacks back through the NSA’s turbulent history, from the “almost 
insane logical disconnects” of the cold war stand-off to the fall of the 
Berlin Wall. This is a balanced, authoritative portrait of an institution in 
which brilliant innovation in mathematics, computing and technology 
has coexisted with gross invasions of societal privacy. 


White Trash: The 400-Year Untold History of Class in America 
Nancy Isenberg VIKING (2016) 

Crackers, clay-eaters and “poor white trash”: the white US 
underclass has endured crass labelling from colonial times. That 
marginalization begs vast questions about US democracy, argues 
historian Nancy Isenberg. Her powerful social and cultural history 
uncovers new facets of known stories, from class conflict in the 
American Civil War to the sterilization of destitute whites by interwar 
eugenicists (V. Nourse Nature 530, 418; 2016). At once brutal and 
enlightening, this is the chronicle of a dispossessed people caught in 
rural stasis, and the social and political forces that keep them there. 


Innovation and Its Enemies: Why People Resist New Technologies 
Calestous Juma OXFORD UNIVERSITY PRESS (2016) 

From smart grids to new commodities, innovation disrupts by default 
— and if itis truly transformative, can trigger controversy and policy 
headaches. Sustainable-technologies expert Calestous Juma explores 
those tensions in this original study. He follows coffee from Ethiopia 
through Europe as it is embraced and denounced, shaping economies, 
technologies and industries. He looks at the advent of electricity and 
transgenic crops. For the pace of innovation and institutional change 
to synchronize, he concludes, both nimble leadership and rigorous, 
respectful public education must be brought into play. 


Tide: The Science and Lore of the Greatest Force on Earth 

Hugh Aldersey-Williams VIKING (2016) 

More than 40% of humanity lives within 150 kilometres of a coast, 
yet a clear understanding of tides — that oceanic phenomenon 
driven by the gravitational lock of Earth and Moon — is rare. Science 
writer Hugh Aldersey-Williams’s corrective meshes a history of 

the science (by way of Aristotle, Galileo and Isaac Newton, among 
others) with tide-related technologies and tidally sculpted events. It’s 
an eloquent ebb and flow, from observations of a 13-hour tidal cycle 
in a Norfolk salt marsh to passages on the legendary maelstroms of 
Novia Scotia and California’s body-surfing fish, the grunion. 


The Radium Girls 

Kate Moore SIMON & SCHUSTER (2016) 

In the 1910s, radium was marketed as a cure-all, incorporated 

into drinking water, cosmetics and even jockstraps. Kate Moore’s 
harrowing chronicle traces how a number of young US women, 
hired to paint military timepieces with radium-laced paint, paid the 
price: many succumbed to radiation poisoning and died hideous 
deaths. Ultimately, the landmark case won by five of them inspired 
globally important research into radiation and its impacts — 
including longitudinal studies with survivors. Barbara Kiser 
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Maurice Strong (front left, at the 1972 UN Conference on the Human Environment) sought an alternative to the GDP. 


> Nordhaus also took into account envi- 
ronmental costs — and the value of work 
in the home — in their 1972 proposal for 
a metric called the Measure of Economic 
Welfare. 

Another challenge to convention came 
from Mahbub UI Haq and Amartya Sen, 
who in 1990 created the now widely used 
UN Human Development Index (HDI), 
which includes factors such as life expec- 
tancy and education. 

More social scientists are now exploring 
the definition and use of economic statis- 
tics. There is also policy interest alongside 
the scholarly debate. In 2008, then-French 
president Nicolas Sarkozy set up a commis- 
sion led by Sen, Stiglitz and fellow econo- 
mist Jean-Paul Fitoussi to investigate the 
measurement of economic well-being. And 
UK economist Charles Bean’s 2016 Inde- 
pendent Review of UK Economic Statistics 
(see go.nature.com/1tvadaj) raises funda- 
mental questions about GDP’s viability in a 
modern economy, for example concerning 
its mismeasurement of digital activity. 


ONE OR MANY 

The balance of opinion in economics 
currently favours supplementing GDP with 
a dashboard that incorporates measures of 
environmental impacts, health and social 
indicators, as Costanza neatly summarized 
in his 1997 article (see also R. Costanza et al. 
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Nature 505, 283-285; 2014). Economists are 
taking considerable interest in the meas- 
urement debate, although oddly, Masood 
claims that the profession is ignoring the 
issue. His own call for a nuanced metric that 
factors in natural capital and human well- 
being sticks to one indicator. He thinks that 
GDP is so tightly woven into the economic 
fabric that anything more complicated than 
a single number 
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to GDP have a 

serious drawback. They hide relative valu- 
ations of their components, whereas GDP 
makes these explicit because it uses mar- 
ket prices. For instance, Martin Ravallion, 
former director of research at the World 
Bank, notes that the HDI implicitly val- 
ues poor lives much less than rich ones. 
Because income and human life expec- 
tancy are combined into one index, there 
is an implied value of just US$0.51 for an 
extra year of life in Zimbabwe, compared 
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to several thousand dollars in rich coun- 
tries (see M. Ravallion Troubling Tradeoffs 
in the Human Development Index http:// 
doi.org/d8d2cr; World Bank, 2010). This 
flaw could be corrected, but the point is 
that any single index internalizes such 
trade-offs. 

The debate over whether to use a dash- 
board or a single indicator is unresolved. 
Interest among economists, other social 
scientists and environmentalists has 
climbed in recent years, but there is much to 
research and discuss on how best to meas- 
ure economic welfare, taking into account 
sustainability and the quality of life, before 
a new international standard is defined and 
adopted. 

In hindsight, the original debate about 
GDP looks more compressed than it really 
was — some economists were still disput- 
ing it into the 1950s. A new shift will take 
just as long, but it is definitely under way. 
And about time too, for the reasons that 
The Great Invention explains so clearly. m 


Diane Coyle is a professor of economics 
at the University of Manchester, UK, and 
author of GDP: A Brief but Affectionate 
History. She is also a member of the UK 
government's Natural Capital Committee 
and an Office for National Statistics 
Fellow. 

e-mail: diane.coyle@manchester.ac.uk 
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federal spending as a key component of the 
equation, along with consumer spending 
and investment. Thus, GDP was born asa 
transatlantic effort, led by Keynes's assis- 
tants in the UK treasury, Richard Stone and 
James Meade. By the end of the 1940s, it was 
standardized through the United Nations, 
and the same international process is in 
place today. 

Masood covers decades of challenges to 
GDP conventions that make for a fascinat- 
ing institutional and human story. Those 
seeking an alternative included UN official 
Maurice Strong, a key figure in the 1972 UN 
Conference on the Human Environment 
and the 1992 Earth Summit. Other critics 
were Italian industrialist Aurelio Peccei and 
British civil servant Alexander King, who 
together established think tank the Club 
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that GDP doesn’t 
take into account how natural assets are 
depleted to generate current income 
and consumption. Several proposals for 
new models have underlined the need to 
account for the environment; Masood (an 
erstwhile Nature journalist) praises a 1997 
paper on these proposals co-authored by 
economist Robert Costanza (R. Costanza 
et al. Nature 387, 253-260; 1997). Econ- 
omists James Tobin and William > 
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Books in brief 


Code Warriors 

Stephen Budiansky KNOPF (2016) 

Code fiend and writer Stephen Budiansky’s history of the US National 
Security Agency (NSA) and its intelligence battles with the Soviet Union 
opens in 2013, as whistle-blower Edward Snowden enacts his long- 
planned exposé. In a narrative laced with cryptanalysis, Budiansky 
then tacks back through the NSA’s turbulent history, from the “almost 
insane logical disconnects” of the cold war stand-off to the fall of the 
Berlin Wall. This is a balanced, authoritative portrait of an institution in 
which brilliant innovation in mathematics, computing and technology 
has coexisted with gross invasions of societal privacy. 


White Trash: The 400-Year Untold History of Class in America 
Nancy Isenberg VIKING (2016) 

Crackers, clay-eaters and “poor white trash”: the white US 
underclass has endured crass labelling from colonial times. That 
marginalization begs vast questions about US democracy, argues 
historian Nancy Isenberg. Her powerful social and cultural history 
uncovers new facets of known stories, from class conflict in the 
American Civil War to the sterilization of destitute whites by interwar 
eugenicists (V. Nourse Nature 530, 418; 2016). At once brutal and 
enlightening, this is the chronicle of a dispossessed people caught in 
rural stasis, and the social and political forces that keep them there. 


Innovation and Its Enemies: Why People Resist New Technologies 
Calestous Juma OXFORD UNIVERSITY PRESS (2016) 

From smart grids to new commodities, innovation disrupts by default 
— and if itis truly transformative, can trigger controversy and policy 
headaches. Sustainable-technologies expert Calestous Juma explores 
those tensions in this original study. He follows coffee from Ethiopia 
through Europe as it is embraced and denounced, shaping economies, 
technologies and industries. He looks at the advent of electricity and 
transgenic crops. For the pace of innovation and institutional change 
to synchronize, he concludes, both nimble leadership and rigorous, 
respectful public education must be brought into play. 


Tide: The Science and Lore of the Greatest Force on Earth 

Hugh Aldersey-Williams VIKING (2016) 

More than 40% of humanity lives within 150 kilometres of a coast, 
yet a clear understanding of tides — that oceanic phenomenon 
driven by the gravitational lock of Earth and Moon — is rare. Science 
writer Hugh Aldersey-Williams’s corrective meshes a history of 

the science (by way of Aristotle, Galileo and Isaac Newton, among 
others) with tide-related technologies and tidally sculpted events. It’s 
an eloquent ebb and flow, from observations of a 13-hour tidal cycle 
in a Norfolk salt marsh to passages on the legendary maelstroms of 
Novia Scotia and California’s body-surfing fish, the grunion. 


The Radium Girls 

Kate Moore SIMON & SCHUSTER (2016) 

In the 1910s, radium was marketed as a cure-all, incorporated 

into drinking water, cosmetics and even jockstraps. Kate Moore’s 
harrowing chronicle traces how a number of young US women, 
hired to paint military timepieces with radium-laced paint, paid the 
price: many succumbed to radiation poisoning and died hideous 
deaths. Ultimately, the landmark case won by five of them inspired 
globally important research into radiation and its impacts — 
including longitudinal studies with survivors. Barbara Kiser 


23 JUNE 2016 | VOL 534 | NATURE | 473 
2016 Macmillan Publishers Limited. All rights reserved. 


Correspondence 


Commit to equity for 
women researchers 


Heads of research agencies from 
nearly 50 countries — large 

and small, with developed and 
emerging economies — adopted 
a Statement of Principles and 
Actions Promoting the Equality 
and Status of Women in Research 
at the Global Research Council’s 
fifth annual meeting last month 
in New Delhi (see go.nature. 
com/lyqtyg). 

According to a report 
commissioned by the Science and 
Engineering Research Board of 
India and Research Councils UK, 
which hosted the meeting, women 
make up only 11% of full science 
and engineering professors in 
the European Union, less than 
25% of academics in Asia and less 
than 5% of researchers in some 
Middle Eastern countries (see 
go.nature.com/1luywmgu). The 
report echoes statistics from the 
US National Science Foundation, 
of which I am director (see 
go.nature.com/1rpvmrk for 
the Science and Engineering 
Indicators). 

At the meeting, we gained 
greater awareness of long- 
standing historical obstacles to 
women’s participation in certain 
fields, and of the importance of 
including gender considerations 
in research design and outcome 
analysis. Each of us came 
away with a firmer idea of the 
opportunities to lead within 
our jurisdictions, and in a wider 
policy context. 

The national research heads 
agreed to “expect and encourage 
improved equality and diversity 
policies and practices” within 
their respective research 
provinces, and recommended 
alist of actions. These included 
diversity training, recognizing 
unconscious bias, implementing 
family-friendly policies and 
creating pathways for women 
to rise to leadership positions. 
We agreed to collect follow-up 
data and make them available for 
comparative analysis. 

Only by supporting the best 
talent — wherever it hails from 


— can we truly encourage and 
support research with the greatest 
academic, economic and societal 
impacts. Ensuring global equity 
for women in research requires 
that we each make a personal 
commitment to action. 

France A. Cordova National 
Science Foundation, USA. 
acollins@nsf.gov 


Don’t bank African 
rhinos in Australia 


The Australian Rhino Project (see 
go.nature.com/28c8s29) aims to 
move 80 rhinoceroses from South 
Africa to Australia by 2019 as 
conservation ‘insurance’ against 
the poaching epidemic — ata cost 
of about US$3.5 million. The first 
6 will go this year. In our view, 

this project is diverting funds 

and public interest away from the 
actions necessary to conserve the 
animals in Africa. 

The scheme is supported by 
the South African and Australian 
governments, academic 
institutions in Australia, and 
corporations and conservation- 
management organizations. Its 
cost equates to more than the 
anti-poaching budget of South 
African National Parks for 2015. 
We suggest that this money would 
be better spent on local, on-the- 
ground action in South Africa or 
on education programmes in Asia 
to reduce demand for rhino horn. 

Africa's rhinos are not even the 
highest priority in pachyderm 
conservation, particularly 
because only white rhinos from 
private collections are to be 
moved. The global estimated 
populations of white and black 
rhinos are 20,170 and 4,880, 
respectively — still further from 
extinction than Indian (2,575), 
Sumatran (275) and Javan (60) 
rhinos. 

We feel that the project has 
echoes of colonial times, when 
African resources were exploited. 
Taking biodiversity assets such 
as rhinos for ‘safe keeping’ in the 
West seems to us as patronizing 
and disempowering as the theft of 
cultural artefacts. 


Matt W. Hayward* Bangor 
University, UK. 
m.hayward@bangor.ac.uk 

*On behalf of 4 correspondents (see 
go.nature.com/1w32n9q for full list). 


Freelance scientists 
need EU for support 


As ‘freelance’ scientists, we 
undertake research jointly 

with academic institutions 

and provide Earth-science 
modelling services for clients — 
an alternative career path that 
European Union funding enables 
us to pursue. If the United 
Kingdom chooses to leave the 
EU after this week’s referendum, 
small private research 
organizations and independent 
researchers could be doomed. 

Independent researchers 
cannot apply for funding from UK 
research councils. Private research 
organizations need demonstrable 
in-house research capacity and 
a minimum of ten researchers. 
These eligibility criteria are at 
odds with those of the UK arts 
councils and the European 
Commission, which consider 
proposals from anyone with a 
track record in their discipline. 

With 88% of UK postdocs 
never securing a tenured 
position (The Scientific Century: 
Securing our Future Prosperity; 
Royal Society, 2010), these 
requirements need to be relaxed 
(see also Nature 520, 144-147; 
2015). Entrepreneurial young 
scientists could then continue 
their research without the 
backing of a university. 

For the United Kingdom to 
maintain its competitive edge, 
funding bodies need to recognize 
that the research landscape is 
changing. In this era of digital 
connectivity, scientists can still 
be embedded in the research 
community while working 
outside traditional research 
organizations. 

Cécile B. Ménard, Melody 
Sandells CORES Science and 
Engineering, Burnopfield, 
Newcastle upon Tyne, UK. 
cecile.menard@coresscience.co.uk 


Carry on celebrating 
Mendel’s legacy 


I disagree with Gregory Radick’s 
strategy for teaching modern 
genetics (Nature 533, 293; 
2016). In my view, we should not 
discard the legacies of Gregor 
Mendel, William Bateson, 
Walter Sutton, Thomas Hunt 
Morgan and their ilk, whose 
beautiful science continues to 
provide the best explanations for 
inheritance. 

I teach basic genetics to 
veterinary students, who learn 
the laws of inheritance without 
any historical context, and to 
biology students, who learn 
the scientific method and how 
it influenced the development 
of genetic concepts. The 
biologists revisit hypotheses 
proposed to account for the same 
observations — such as Bateson’s 
and W.F.R. Weldon’s contrasting 
views of inheritance. They come 
to understand that Mendel’s 
hypothesis of hereditary units 
(‘alleles’) explains the data better. 
They learn that theories and 
hypotheses are not immutable, 
that science is incomplete, and 
that every discovery stimulates 
new questions. 

With the Boveri-Sutton 
chromosome theory, it became 
clear that Mendelian inheritance 
is indeed the core of genetics. It 
underpins association-mapping 
studies, population genetics 
and clinical genetics. Such 
new information continues to 
corroborate Mendel’s hypothesis 
of inheritance. There is no need 
to remove Mendel from his 
honorary position in the genetics 
curriculum to spark creative 
science. 

Tatiana T. Torres Institute of 
Biosciences, University of Sao 
Paulo, Brazil. 
tttorres@ib.usp.br 


CONTRIBUTIONS 
Correspondence may be 
sent to correspondence@ 
nature.com after consulting 
go.nature.com/cmchno. 


23 JUNE 2016 | VOL 534 | NATURE | 475 


© 2016 Macmillan Publishers Limited. All rights reserved 


NEWS & VIEWS 


For News & Views online, go to 
nature.com/newsandviews 


ASTROPHYSICS 


Recipe for a black-hole merger 


The detection of a gravitational wave was a historic event that heralded a new phase of astronomy. A numerical model of the 
Universe now allows researchers to tell the story of the black-hole system that caused the wave. SEE LETTER P.512 


J. J. ELDRIDGE 


he first gravitational-wave source was 

detected on 14 September 2015. What 

surprised some was that the signal 
came from the merger of two black holes, 
each about 30 times the mass of the Sun. Now, 
on page 512, Belczynski et al.’ not only show 
that such a system can arise naturally from 
our understanding of how the stars in binary 
systems interact, but also unlock the history 
of the black holes from their birth as two 
massive stars. 

Other groups” * have sought to characterize 
the source of the gravitational wave (now 
known as GW 150914), but what makes 
Belczynski and colleagues’ work stand out is 
that they have created a numerical model of 
the Universe that allows every phase of the 
evolution of binary stars to be followed, from 
the birth of the Universe to the present. This 
enabled them to search through the list of 
observable black-hole binaries to find those 
that match the parameters of the gravitational- 
wave source. They then tracked back each 
candidate source's evolution to estimate the 
relative probability that the source could have 
caused the event, and thus identify which was 
the most likely. 

The authors conclude that the black holes 
probably started off as two stars that had 
masses 40 to 100 times that of the Sun and 
were born about 2 billion years after the Big 
Bang. These stars turned into black holes 
after a further 5 million years, and merged 
10.3 billion years after that, emitting the 
gravitational-wave signal that was detected 
1.2 billion years later (Fig. 1). Other scenarios 
are possible, but less likely. 

The black holes were monsters, and the 
results show that their progenitor stars would 
have been some of the brightest and most 
massive in the Universe. If the proposed age 
of the stars’ formation is correct, then they 
might have contributed to the reionization of 
the Universe — one of the key events in the 
Universe’s evolution. It is also likely that the 
stars were relatively pure in composition: they 
consisted mostly of hydrogen and helium, 
and contained less than 10% of the heavy ele- 
ments (such as carbon, oxygen and iron) that 
pollute our Sun. This indicates that the stars 
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Gas envelope 


Black hole © 


Figure 1 | Key interactions that led to a black-hole merger. Belczynski et al.’ used numerical models 
of the Universe to unlock the history of the black-hole binary system that caused the gravitational wave 
reported in 2015. a, They propose that one of the two stars in the progenitor binary system exploded as 
a supernova (not shown), forming a black hole. b, That was engulfed by the second star as it evolved and 
expanded, generating a system in which the two objects shared the same gas envelope. c, Interaction 
between the two objects gradually decreased the distance between them, and the second star formed a 
black hole. d, The two black holes continued to get closer by radiating gravitational waves, eventually 
merging and generating gravitational waves strong enough to be detected. 


would have been ina small dwarf galaxy, rather 
than a large spiral galaxy, such as our own 
Milky Way. 

This study is important for two reasons. 
First, GW 150914 provides an exciting test 
for stellar evolution theory. Previously, core- 
collapse supernovae represented the latest 
stage of a star’s life that could be used to 
constrain the nature of the progenitor stars”. 
Belczynski et al. have gone beyond that to the 
final event that occurs within a stellar binary 
that has already survived two supernovae. 
Their work, therefore, places firm constraints 
on stellar evolution and on how stars die in 
supernovae. Second, it provides a new way to 
measure the accuracy of models of star for- 
mation and cosmic evolution throughout the 
history of the Universe. 

There are, of course, caveats and assump- 
tions that add uncertainty to Belczynski and 
colleagues’ model. One uncertainty is how 
massive the black hole formed by a star can be; 
this is determined by how explosive the black- 
hole-forming supernovae are. The explo- 
sive nature of massive stars is a hot topic of 
research, with some evidence’ suggesting that 
black holes can form directly from stars with- 
out a supernova, which is what Belczynski and 
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colleagues assume. But stars might also form 
black holes and explode. In binary systems, 
this would affect the nature of the final black- 
hole system, and the time taken for the 
black holes to merge. 

Another uncertainty involves an inter- 
mediate phase of binary-star evolution. As 
stars in binaries evolve, their radii increase, 
sometimes growing to the size of their orbit 
so that they get in each other’s way — this is 
called the common-envelope phase. Typically, 
the star that grows first loses its outer envelope 
of gas, leaving a small, hot core that eventually 
forms a black hole. The binary’s orbit decreases 
in size during this process. During the merger 
of two black holes, the closer the two objects 
were when they formed, the sooner they will 
merge. But researchers do not know how much 
the orbit can shrink in the common-envelope 
phase despite decades of work dedicated to 
finding an answer’. 

Future gravitational-wave signals may help 
astrophysicists to constrain both uncertain- 
ties, but for now, Belczynski et al. generate an 
‘optimistic’ and a ‘pessimistic’ model Universe, 
to assess the highest and lowest possible rates 
of black-hole mergers. They demonstrate that 
systems that would form black-hole binaries 


of the sort that generated GW 150914 would 
form in both models, and that the rate of black- 
hole mergers in the Universe matches that 
inferred from the gravitational-wave detec- 
tion. The authors also suggest that rotation of 
stars about their own axes is not required to 
explain most gravitational-wave sources, but 
it has been suggested that such rotation could 
increase the number of black-hole merg- 
ers’. Nevertheless, there is still more work to 
be undertaken, and more physics to include 
in the models. 

Belczynski and colleagues’ study is 


CELL BIOLOGY 


tremendously exciting because it examines the 
effects of anew constraint on how stars and the 
Universe evolve, identified by GW 150914. 
With each gravitational-wave signal detected 
we'll learn something new. And with rumours 
that more events will be announced soon, 
we may not have to wait too long for the 
next lesson. m 


J. J. Eldridge is in the Department of Physics, 
University of Auckland, Private Bag 92019, 
Auckland, New Zealand. 

e-mail: j.eldridge@auckland.ac.nz 


Membrane kiss mediates 
hormone secretion 


Communication between cells relies on hormone release from secretory granules, 
but how these vesicles fuse with cell membranes is unclear. An imaging study 
provides in vivo evidence for a stable intermediate fusion step. SEE LETTER P.548 


TOLGA SOYKAN & VOLKER HAUCKE 


( vse communicate with each other by 
secreting small messenger molecules 
such as hormones or growth factors, 

many of which are stored in vesicles called 

secretory granules. To release these messen- 
gers to the cell exterior, secretory granules 
fuse with the cell membrane through a pro- 
cess called exocytosis. On page 548, Zhao 
et al.' show that exocytosis occurs through the 
reversible formation of a hemi-fused inter- 
mediate, in which only one of the two leaflets 


Docked granule 


Outer 
lipid layer 


~ Cellexteror 9 @ — 
eo °@ e 
'e eo. hU°@ © © @ oe A @ 


Figure 1 | A hemi-fused intermediate in membrane fusion. Zhao et al.' 
observed in chromaffin cells that secretory granules that have docked with 
the cell membrane can undergo reversible hemi-fusion, in which the leaflets 
of the outer lipid layer of the secretory granule and the cytoplasmic-facing lipid 
layer of the cell membrane merge. In the authors’ experiments, the inner 

lipid layer of the hemi-fused secretory granule maintains a tight seal that 


(lipid layers) of the cell membrane has merged 
with the secretory granule’s membrane. These 
results answer the long-standing question of 
whether membrane fusion involves a hemi- 
fused intermediate and also provide in vivo 
evidence for the ‘kiss-and-run’ model of 
secretory-granule exocytosis. 

Examples of hormones that are released by 
secretory granules include insulin in pancreatic 
B-cells and adrenaline, found mainly in chro- 
maffin cells of the adrenal glands. Hormone 
secretion requires the granules to dock with the 
cell membrane and then partially or completely 


Hemi-fusion/fission intermediate 


(Ca**) levels. 
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merge with it®’, allowing hormone release into 
the extracellular space. However, the regula- 
tory mechanism that underlies this type of 
membrane fusion’, and the nature of the inter- 
mediates involved, have long been debated*”. 

Zhao et al. used a high-resolution optical- 
microscopy approach’ to observe membrane 
fusion directly in living chromaffin cells in real 
time and in three dimensions. To trigger secre- 
tory-granule release, the chromaffin cells were 
electrically stimulated while being bathed ina 
cell-impermeable fluorescent marker dye. The 
granule’s lipid layer maintains a tight seal, so 
that the dye can gain access to the interior of a 
docked secretory granule only when full fusion 
of the granule and cell membrane occurs. Use 
of this dye, along with simultaneous moni- 
toring of the cytoplasmic leaflet of the cell 
membrane using a lipid-bound fluores- 
cent protein, enabled the authors to ana- 
lyse the membrane changes that occur 
during fusion. 

As the granules begin to fuse, diffusion 
of the fluorescent marker protein from the 
inner leaflet of the cell membrane to the outer 
leaflet of the granule membrane results in an 
increase in fluorescence in the granule that 


Fully fused granule 


prevents access of external dye marker molecules (red). When the 
secretory granules undergo full fusion with the cell membrane, the external 
dye can enter the secretory granules. Fully fused secretory granules can 
revert to the hemi-fusion/fission intermediate in a process that depends on 
binding of the protein dynamin and is regulated by cytoplasmic calcium 
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serves as a reporter for changes in the fusing 
membranes. In more than half of all fusion 
events, the authors observed a simultaneous 
influx of fluorescent dye into the secretory- 
granule lumen and rise in membrane fluores- 
cence on the docked secretory granule. This 
type of fluorescence change would be expected 
if secretory granules fuse completely with the 
cell membrane. 

The authors also frequently observed 
events in which a rise in fluorescence of the 
cell-membrane marker on the secretory gran- 
ules preceded dye influx by several seconds, 
or in which no dye influx into the secretory 
granule was detected during 40 seconds of 
observation. It was surprising that so many 
such events were detectable and that they are 
stable over many seconds. From these and 
further control experiments, the authors con- 
cluded that these events must correspond to 
hemi-fusion, thought to be a metastable state 
in which the outer leaflet of the secretory gran- 
ule and the cytoplasm-facing leaflet of the 
cell membrane have merged, while the inner 
leaflet of the secretory granule and the extra- 
cellular-facing leaflet of the cell membrane 
remain separate’. 

What might be the advantages of this type of 
fusion mechanism? Unlike neuronal synaptic 
vesicles, which release neurotransmitter mol- 
ecules in an all-or-nothing, ‘quantal fashion, 
secretory granules can partially secrete their 
hormone content. Such partial release has been 
postulated to be mediated by kiss-and-run 
exocytosis, which is a model for how secretory 
granules open and close a fusion pore through 
which molecules pass between the granule and 
the cell membrane’. Whether the fusion pore is 
made of lipid or protein, or both, is not known. 
A stable hemi-fused intermediate might indi- 
cate the existence of a reversible fusion process 
that would enable partial release during secre- 
tory-granule exocytosis, and might underlie 
fusion-pore opening and closing’. 

An identical structure to a hemi-fused 
intermediate could also arise if a fully fused 
granule underwent fusion-pore closure 
through fission (the splitting of a membrane 
into two separate entities). In this context, it 
would be called a hemi-fission intermediate. 
To probe whether fusion-pore opening and 
closing are reversible processes that proceed 
through a common intermediate, Zhao et al. 
tracked the movement of the fluorescent 
dye and membrane markers over time. They 
frequently observed fluorescence dynamics 
consistent with closure of the fusion pore 
through a hemi-fission state. 

The authors then investigated the role of 
dynamin, a protein involved in endocytosis — 
a cell-membrane-dependent process in which 
materials are transported into cells. Dynamin 
can bind’ to narrow lipid ‘necks’ at places where 
membrane pinching occurs, and Zhao et al. 
found that its depletion or inhibition tipped 
the balance towards full fusion of both layers of 
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the granule and cell membranes at the expense 
of hemi-fusion events. Conversely, the hemi- 
fused state seemed to be stabilized by a high 
influx of calcium into the cytoplasm. Overall, 
the authors’ data indicate that secretory-granule 
fusion and fission are reversible processes, at 
least in chromaffin cells, with the transition 
from hemi-fusion to full fusion being counter- 
acted by dynamin and regulated by cytoplasmic 
calcium (Fig. 1). Consistent with this model, 
Zhao et al. occasionally observed reversible 
opening, closing and reopening of fusion pores 
in the same docked secretory granules. 

Although hemi-fusion has previously been 
observed and characterized in reconstituted 
systems in vitro*', Zhao and colleagues’ work 
is the first demonstration of this process in 
living cells. The new results indicate that this 
intermediate fusion state is a physiologically 
relevant and surprisingly stable intermediate 
en route to the exocytic release of hormones 
and related molecules. It has been suggested 
that hemi-fusion underlies the fusion of yeast 
membrane-bound structures called vacuoles”, 
and that it is also probably responsible for the 
delayed fusion pathway in reconstituted vesi- 
cles in vitro*. However, the authors’ model of 
exocytic membrane fusion is difficult to rec- 
oncile with the idea that the fusion pore is 
lined with transmembrane proteins — as has 
been postulated from mutational analysis of 
the transmembrane segments of key exocytic 
proteins”’” — because transmembrane pro- 
teins span both layers of the membrane and 
therefore would be excluded from the centre 
of the hemi-fused intermediate. 

Whether a mechanism that involves hemi- 
fused intermediates operates in neurons to 
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release neurotransmitters also remains an 
open question. However, the direct visualiza- 
tion of membrane fusion during neurotrans- 
mission presents special challenges, such as 
the speed of exocytosis, the small size of syn- 
aptic vesicles and the complex architecture of 
neurons in the brain. Finally, the observa- 
tion that dynamin regulates the partitioning 
between hemi- and full fusion or fission events 
lends further support to the idea that mem- 
brane fission during endocytosis and other 
vesicle-budding events proceeds through 
hemi-fission intermediates”. m 
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Quantum simulation of 
fundamental physics 


Gauge theories underpin the standard model of particle physics, but are difficult 
to study using conventional computational methods. An experimental quantum 
system opens up fresh avenues of investigation. SEE LETTER P.516 


EREZ ZOHAR 


here are many questions still to be 

answered about the standard model of 

particle physics, which describes the 
fundamental forces and interactions of nature. 
On page 516, Martinez et al.' report a pioneer- 
ing experiment in which calcium ions that are 
trapped and controlled by electromagnetic 
fields form a quantum simulator of elementary 
particle physics. This is a first experimental 
step towards the use of quantum simulators to 
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answer some of those outstanding questions. 

Theoretical physics often involves problems 
that do not have a simple mathematical solu- 
tion. This quandary is usually overcome using 
numerical calculations performed by conven- 
tional (classical) computers. Some problems, 
however, cannot be solved by these techniques, 
and require other methods, especially when 
direct experimental study is also impossible 
or difficult. 

Physicist Richard Feynman suggested that, 
to simulate the quantum behaviour of physical 


systems, other quantum systems must be used 
— quantum computers’. This concept, called 
quantum simulation’, is beautifully simple. 
Consider a quantum system, A, that cannot 
be studied by conventional theoretical and 
experimental methods, and another quantum 
system, B, that can be built and controlled with 
high precision in the laboratory. If the physical 
components of B, and the interactions between 
them, mimic and behave like those of A, then 
Bisa quantum simulator of A. Once B is built, 
tuned and operated, experimental study of it 
effectively serves as a study of A. 

Quantum simulators can be either analog, 
in which system B simply ‘behaves like sys- 
tem A because its dynamics and interactions 
exactly or approximately map those of A, or 
digital, in which a sequence of operations acts 
on the components of B and possibly on some 
auxiliary elements, generating dynamics that 
are equivalent to those of A with controlled 
precision. The simulating systems are often 
atomic or optical, and have included systems 
of cold atoms or ions trapped by electromag- 
netic fields. These have been designed (and 
some have been built) to simulate many areas 
of physics, ranging from condensed-matter 
physics to gravitational effects’. 

The interactions between elementary parti- 
cles are a great candidate for quantum simula- 
tion. In the standard model of particle physics, 
such interactions are mediated by vector fields 
known as gauge fields, thanks to a special type 
of symmetry called local gauge invariance. 
Electrons, for example, interact according 
to the quantum theory of electrodynamics 
(QED, the simplest gauge theory) through 
the electromagnetic gauge field. Other fun- 
damental constituents of matter are quarks, 
whose interactions through the strong force 
are described by another gauge theory, quan- 
tum chromodynamics (QCD). 

QCD has several open questions. One is 
the phenomenon of confinement, in which 
quarks are bound together by the strong force 
to form composite particles called hadrons 
(which include protons and neutrons). The 
strong force prevents quarks from being iso- 
lated experimentally. The theoretical study 
of confinement is also difficult, and has been 
a subject of research for decades. A highly 
successful avenue for studying gauge theo- 
ries is called lattice gauge theory’, but using 
it for conventional computer simulations 
is still problematic for the study of several 
questions. 

The quantum simulation of lattice gauge 
theories has been a rapidly growing area of 
study over the past few years, and several pro- 
posals have been made for how such simulators 
could be realized*®. These simulators quanti- 
tatively map the simulated system — which is 
typically highly energetic — onto low-energy 
atomic and optical experimental systems. 
Martinez et al. report the first experimental 
realization of just such a quantum simulator. 


Electrode 
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Figure 1 | Quantum simulation of a gauge theory. The quantum theory of electrodynamics (QED) is 
the simplest gauge theory (a type of field theory), and describes how particles of matter such as electrons 
interact through an electromagnetic field. Martinez et al.' have used a system of four calcium ions 
confined by electromagnetic fields (not shown), generated by electrodes, to simulate a one-dimensional 
model of a variant of this theory known as lattice QED. Each ion serves as a quantum bit that can adopt 
one of two states, representing the presence or absence of particles, and the interactions between the ions 
are tailored to represent the gauge-field dynamics. The state of each ion, and the interactions between the 
ions, can be manipulated using laser beams (orange and purple). 


The authors simulated lattice QED in a 
one-dimensional space (a lattice Schwinger 
model) using a digital quantum simulator — 
a tailored quantum computer that consists 
of four trapped calcium ions controlled and 
manipulated by lasers (Fig. 1). Two energy 
levels of each ion form a quantum bit (a qubit), 
which represents the presence or absence of a 
particle of matter in the corresponding simu- 
lated theory. The gauge field is represented as 
interactions between the ions that are direct 
and exotic, yet experimentally implementable. 
This is achieved using a theoretical transfor- 
mation available in one dimension that elimi- 
nates direct manifestations of the field in the 
simulated model and allows it to be expressed 
in terms of matter. 

The quantum simulation of complicated 
gauge theories requires a non-trivial combi- 
nation of advanced technologies in atomic 
and optical physics. Martinez and colleagues 
therefore investigated a small version of a 1D 
lattice QED model, a relatively simple system 
that enabled their results to be compared with 
predictions, but that still demonstrates impor- 
tant features of more-complicated models. The 
authors’ quantum simulator did indeed repro- 
duce the expected physical behaviour of the 
simulated model with great accuracy. 

In future work, larger systems should be 
simulated that have a greater number of 
dimensions (to reveal further non-trivial types 
of interaction) and involve more-complicated 
simulated models such as QCD. Quantum 
simulators for many of these models have 
already been proposed — both analog*® and 
digital’ — for various gauge theories in differ- 
ent dimensions, mostly using cold atoms, but 
also trapped ions and superconducting qubits. 


© 2016 Macmillan Publishers Limited. All rights reserved. 


The experimental requirements and feasibility 
of these proposals vary, because the simulators 
use different approaches and involve various 
simulated models, but they mostly require 
combinations of existing experimental tech- 
niques. Technological developments will 
help to make such experiments more achiev- 
able, even for the simulation of complicated 
models. As the first quantum simulator of a 
lattice gauge theory to be built, Martinez and 
co-workers’ system serves as a beacon that will 
lead gauge-theory physicists to the promised 
land of experimental realization. 

The authors’ work proves that it is indeed 
realistic to use quantum-optics techniques to 
study particle physics and fundamental forces. 
Further theoretical and experimental advances 
might enable quantum simulators to solve 
challenges such as study of the exotic phases of 
QCD, and to observe new phenomena. More 
generally, this realization of the great power of 
quantum simulation reminds us how wonder- 
fully multidisciplinary physics is. m 
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EVOLUTION 


Gene regulation 


in transition 


An in-depth analysis of a close relative of animals, Capsaspora owczarzaki, 
provides clues to the changes in gene regulation that occurred during the 


transition to multicellularity. 


DAVID S. BOOTH & NICOLE KING 


he origin of all animals, from humans to 

sponges and comb jellies, can be traced 

back to a major event in evolutionary 
history: the transition to multicellularity. This 
transition was no doubt shaped by environ- 
mental changes — such as rising oxygen 
levels — and the evolution of cells that could 
engulf other, smaller cells’. However, to fully 
understand what drove this seminal event, 
we must look to the genome. Writing in Cell, 
Sebé-Pedrés et al.’ report an investigation of 
gene regulation in a microscopic cousin of 
animals, Capsaspora owczarzaki. The study 
indicates that Capsaspora represents a transi- 
tional state in the evolution of gene-regulatory 
mechanisms, and provides a foundation for 
investigating how such mechanisms might 
have contributed to animal origins. 

More than 600 million years ago, a series 
of genetic innovations allowed the progeni- 
tors of animals to exploit emerging environ- 
mental niches on a changing planet’. These 
progenitors cannot be studied directly, so 
how can we identify those genetic innova- 
tions that mattered most for animal origins? 


Most insights into pre-animal genomes have 
come from comparisons of extant animals 
and their close relatives, choanoflagellates 
and Capsaspora (Fig. 1). Contrary to expecta- 
tion, these studies revealed that much of the 
animal genetic toolkit (including the genes that 
encode cell-adhesion proteins such as integrins 
and cadherins, and those for vital signalling 
proteins such as receptor tyrosine kinases) 
is also expressed in Capsaspora and choano- 
flagellates’, indicating that many ‘animal’ genes 
pre-date animal origins. 

Of course, animals are more than the sum 
of their genes — it is the regulated expression 
of genes across space and time that helps to 
differentiate egg from embryo, leg from wing 
or bat from fly. In plants and fungi, as well as in 
animals, transcription factors drive the synthe- 
sis of messenger RNA by interacting with regu- 
latory regions called promoters that are located 
close to their target genes. Proximal control of 
transcription clearly pre-dates animal origins 
and is probably vital for all cellular life. 

By contrast, long-range transcriptional 
regulation by DNA sequences called enhan- 
cers, which can lie more than 10 kilobases 
from the genes they regulate, has so far been 


Capsaspora Choanoflagellata* 


Sponges* 


Ctenophora* 


Nematostella Drosophila Homo 


Figure 1 | Evolution of gene-regulatory mechanisms. Sebé-Pedrés et al.” report that two transcription 
factors, Myc and Brachyury, control similar sets of genes in animals and in a close relative, Capsaspora 
owczarzaki. This indicates that key gene-regulatory networks evolved before the origin of animals 
(indicated by the blue line) and were later co-opted for animal development. By contrast, long-range 
gene-regulatory elements called enhancers are not found in Capsaspora, but have been found in 
Nematostella, an animal that branched off early in evolutionary history. Thus, enhancers might be 


animal-specific (time window over which the evolution of long-range gene regulation might have occurred 
is indicated in red). A full understanding of how the animal gene-regulatory landscape evolved will require 
analyses of other early-branching animals such as sponges and Ctenophora (comb jellies), and other close 
relatives of animals, such as Choanoflagellata, in which gene regulation has not yet been studied (marked *). 
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seen only in animals. Such regulation has been 
hypothesized to underlie the spatial and tem- 
poral coordination of cell differentiation that 
defines animal development*. But whether 
long-range enhancers are truly restricted to 
animals has been unclear, because they are 
often embedded in intricate transcriptional 
networks and can be difficult to detect. 

To investigate how different modes of 
transcriptional regulation may have set the 
stage for animal origins, Sebé-Pedros et al. 
established approaches for functional genom- 
ics in Capsaspora (functional genomics probes 
how dynamic interactions between proteins, 
RNA and the genome correlate with gene 
expression). Despite the fact that Capsaspora 
is anon-model organism, it offers several ben- 
efits for such a study: it is easily cultured in the 
laboratory; it transitions between unicellular 
and aggregative multicellular forms; and its 
genome encodes many transcription factors 
that are evolutionarily conserved in animals®. 

The authors report that, despite its relative 
simplicity, Capsaspora expresses two transcrip- 
tion factors that are integral to animal develop- 
ment — Myc and Brachyury. In animals, Myc 
serves as a master regulator of cell prolifera- 
tion. Brachyury controls a key developmental 
process called gastrulation: this produces the 
body’s three major cell layers, and the protein 
subsequently mediates differentiation of one 
of these layers, the mesoderm. In animals, 
both Myc and Brachyury function by binding 
to enhancers to regulate the transcription of 
a network of downstream genes”*. Remark- 
ably, Sebé-Pedros et al. found that these 
downstream gene networks are conserved in 
animals and Capsaspora. 

Given that cell proliferation is a shared 
feature of Capsaspora and animals, the con- 
servation of the Myc regulatory network in 
the two lineages may not be surprising. But it 
is surprising that Brachyury seems to regulate 
the same types of gene in animals and Capsas- 
pora, despite the fact that Capsaspora neither 
gastrulates nor produces mesoderm. Just as 
genes that animals use for cell adhesion and 
signalling evolved in the progenitors of animals 
before being co-opted for different functions 
in a multicellular context, it now seems that 
some gene-regulatory networks pre-date ani- 
mal origins and were recruited wholesale for 
the regulation of new developmental processes. 

Co-option is not the whole story, however. 
Innovations at the level of genes (such as 
that encoding the animal-specific signalling 
protein Wnt) and gene regulation (such as 
enhancer sequences) might also have con- 
tributed to animal origins. In contrast to the 
expansive intergenic DNA and long-range 
enhancers found in most animal genomes, the 
Capsapsora genome is compact. Despite look- 
ing for signatures of long-range transcriptional 
regulation at several stages of Capsaspora’s life 
cycle, Sebé-Pedros et al. identified none. 

Animals also seem to have evolved new 


classes of promoter. Three types of animal pro- 
moter have been identified’: type I and type III 
promoters regulate genes that act during dis- 
tinct stages in development, whereas type II 
promoters direct ubiquitous gene expression. 
Sebé-Pedrés and colleagues detected type II 
promoters in Capsaspora, but not types I or 
IIL. Therefore, type I and III promoters might 
be animal innovations. 

It will be exciting to explore what these 
findings mean for animal origins and early 
evolution. Future investigations into the thus- 
far-uncharacterized gene-regulatory land- 
scapes of sponges, comb jellies (ctenophores) 
and choanoflagellates promise to help pin- 
point how and when long-range enhancers 
and type I and II promoters first evolved. 
However, the evolutionary distance between 
these organisms and the model animals that 
form the basis of our understanding of ani- 
mal gene regulation may render conserved 
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molecular mechanisms unrecognizable by 
functional-genomic approaches. More- 
over, other evolutionarily important gene- 
regulatory mechanisms may lie undiscovered in 
Capsaspora, choanoflagellates and animals that 
branched off early in the evolution of animals. 
Fully reconstructing gene regulation in the 
progenitors of animals will require studies in 
diverse relatives, integrating modern func- 
tional genomics with forward and reverse 
genetics — which respectively reveal the 
genes responsible for a particular trait, and the 
changes brought about by disrupting the func- 
tion ofa particular gene. Fortunately, armed 
with the functional-genomics insights from 
this study, and the establishment of forward 
genetics in choanoflagellates'’, this goal may be 
achieved in the not-too-distant future. = 
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Synergy of awarm 
spring and dry summer 


An analysis suggests that high carbon uptake by US land ecosystems during the 
warm spring of 2012 offset the carbon loss that resulted from severe drought over 
the summer — and hints that the warm spring could have worsened the drought. 


YUDE PAN & DAVID SCHIMEL 


armer springs and drier summers 
are an expected consequence of 
climate change’. Warmer springs 


should increase the carbon uptake of terres- 
trial ecosystems by lengthening the growing 
season, whereas drier summers should reduce 
uptake because of poor plant growth, espe- 
cially in drought years. In 2012, the conti- 
nental United States had the warmest spring 
on record, and one of the worst summer 
droughts in decades. What did these extremes 
do to the land carbon budget? The answer mat- 
ters because terrestrial carbon uptake helps to 
remove anthropogenic carbon dioxide emis- 
sions from the atmosphere. Writing in Pro- 
ceedings of the National Academy of Sciences, 
Wolf et al.’ conclude that the increased carbon 
uptake during the spring essentially offset the 
carbon lost during the summer — although the 
details of this phenomenon are rather complex. 

The effects of interactions between spring 
warming and summer drought on carbon 
budgets at continental and local scales have 
been reported previously*”, but it is only in 
the past few years that multiple data sources 
with which to evaluate large-scale climate 
effects and their local variations have become 


widely available. The authors arrived at their 
conclusions by comparing three data sets: eddy- 
covariance data that measure carbon exchange 
between the lowest part of the atmosphere (the 
boundary layer) and land biospheres over areas 
of approximately 1 square kilometre, gathered 
by 22 towers scattered across the United States; 
satellite estimates of the timing of plant growth; 
and regional carbon-budget estimates from 
CarbontTracker, a modelling system that uses 
observations of atmospheric CO, levels and 
gradients to infer surface fluxes of the gas over 
land. So what do the data show? 

The severe drought that occurred during 
the summer of 2012 encompassed more than 
half of the continental United States, with 
most of the affected regions falling into the 
two worst categories as defined by the US 
Drought Monitor (extreme and exceptional)’, 
Accordingly, most of the towers reported a loss 
of carbon from their sites during this period, 
and recorded that the annual carbon budgets 
did not balance. Meanwhile, CarbonTracker 
suggested that carbon gain during the spring 
(0.24 petagrams of carbon; 1 Pg is 10’° grams) 
and carbon loss during the summer (0.23 Pg) 
were almost equal for the continental United 
States as a whole. 

However, there was considerable variability 
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within that picture. Eastern temperate forests 

(Fig. la) vigorously sequestered carbon dur- 
ing the spring, and this carbon gain (0.18 Pg) 
slightly more than offset the summer 
carbon loss (0.16 Pg) from the Great Plains 
(Fig. 1b) — the area most affected by drought, 
and which accumulated significantly less car- 
bon than in an average year. Overall, carbon 
uptake for the lands of the continental United 
States had increased, rather than reduced, by 
the end of the year (a rise of 0.11 Pg Cyr’), 
with the surplus resulting from increased 
carbon uptake during the autumn. 

Wolf and colleagues propose that the 
spring warming and summer drought were 
physically coupled through interactions 
between the land surface and atmosphere. 
Simply put, ecosystems entered the sum- 
mer with a relative water deficit because 
water was used up earlier than normal 
during the warmer spring. The deficit led to 
a reduction in evaporative cooling, which 
increased the effects of summer heating, 
causing water stress. 

The authors go on to suggest that early 
warming might even have reinforced weather 
patterns, increasing the probability or the 
severity of summer drought. Confirming this 
will require a more comprehensive analysis 
and diagnosis, including measurements from 
more eddy-covariance towers, but is well 
within the realm of possibility. Clear evidence 
of such a link would undoubtedly help the 
public, policy-makers and resource managers 
to prepare strategies for adapting to droughts 
in the future. 

A strength of Wolf and co-workers’ study 
is that it combines in situ eddy-covariance 
measurements, atmospheric observations 
and remote-sensing data. The eddy-covari- 
ance data provide the most direct evidence for 
seasonal changes in terrestrial carbon uptake, 
and are the only data that directly constrain 
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Figure 1 | Seasonal and regional variations of carbon uptake in the continental United States. a, Eastern temperate forests grew vigorously during the warm 
spring of 2012, and took up more carbon than normal for this season. b, The subsequent hot, dry summer caused crops to fail in the Great Plains, and carbon 
uptake in this region was lower than normal. Wolf et al.” report that the spring carbon uptake offset the summer carbon losses across the continental United States. 


the CarbonTracker and satellite estimates, by 
quantifying both the carbon flux and the full 
energy balance of water-temperature inter- 
actions. The remote-sensing data provide 
the best insight into the timing of biological 
activity across the continent, whereas the 
atmospheric analyses allow the local fluxes 
and processes to be understood in the context 
of the overall carbon budget. In the future, a 
more sophisticated synthesis of the differ- 
ent data will greatly improve the accuracy of 
analyses of carbon and water exchange 
between the land and atmosphere. 

A limitation of the study is that the tower 
sites weren't specifically placed to sample the 
dominant carbon-flux anomalies that were 
revealed by CarbonTracker and the satellite 
data. For instance, the largest region of spring- 
time carbon-uptake anomalies occurred in the 
southeastern United States, where there are no 
flux towers. The largest region of midsummer 
carbon-loss anomalies occurred in the Great 
Plains, where the two sites used in the study 
represent grasslands, rather than the dominant 
agricultural landscapes of this region. 

In addition, the current tower network isn’t 
dense enough to cover climate events such as 
the extreme year of 2012. A facility called the 
National Ecological Observatory Network 
(with which one of us, D.S., was associated for 
several years), designed to sample climate con- 
ditions optimally, will come online in the next 
few years’ and provide uniform coverage of the 
continental United States. Climatologists have 
long designed networks to study spatial pat- 
terns, whereas ecologists have tended to rely 
on local field studies and extrapolated their 
findings to larger areas on the basis of vegeta- 
tion types or other classifications. A reference 
network that covers all spatial components and 
biomes is essential for this type of extrapola- 
tion in future studies. 

Wolf and colleagues’ work shows how 
important systematic, continental-scale sam- 
pling is, because no one site — and not even 
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several sites — could tell the entire story of 
a perturbation such as the one that occurred 
in 2012. As ecologists attempt to under- 
stand problems at ever larger scales, they will 
increasingly direct their creative energies 
towards problems that require massively more 
data than individual research laboratories can 
collect. Information obtained from infrastruc- 
tural monitoring systems and openly avail- 
able data will therefore have a crucial role in 
advancing the science of climate impacts, as 
they already do in other disciplines. = 
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When sperm meets egg 


Sperm-egg binding is mediated by two cell-surface proteins. Structural analysis 
of these proteins, separately and in complex, provides insight into the recognition 
process and the subsequent sperm-egg fusion. SEE LETTERS P.562 & P.566 


KARSTEN MELCHER 


n interaction between two proteins — 
A= 1, which is produced by sperm, 
and Juno, its receptor on eggs — ena- 
bles human fertilization. However, the details 
of this interaction have been elusive. In two 
papers, Aydin et al.' (page 562) and Ohto et al.’ 
(page 566) present the structures of Izumol, 
Juno and the two proteins in complex, deter- 
mined by X-ray crystallography at atomic-level 
resolution. 
Following human copulation, motile sperm 
move towards eggs in the female’s Fallopian 
tubes. The acidic environment of the female 
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reproductive tract triggers an activation step, 
in which sperm become hypermobile and 
penetrate the outer protective layer of the 
egg. A second activation step occurs when or 
shortly before the sperm binds to the zona pel- 
lucida — the tough inner layer that surrounds 
the egg. During this step, the acrosome — 
an organelle at the tip of the sperm head — 
releases digestive enzymes that break down the 
zona pellucida. This acrosome reaction allows 
the sperm to bind to Juno on the egg mem- 
brane, following which the two cells’ mem- 
branes fuse and the cells merge. In turn, the egg 
releases enzymes that crosslink glycoproteins 
of the zona pellucida to make it impenetrable, 
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preventing fertilization by multiple 
sperm (polyspermy)”*. 

Izumol, which is named after a 
Japanese marriage shrine, was first 
identified in 2005 by its binding to 
an antibody that blocked sperm-egg 
fusion’. The protein remains concealed 
intracellularly in the inner acrosomal 
membrane until the acrosome reac- 
tion occurs, when the inner membrane 
becomes part of the cell surface. Juno, 
named after the Roman goddess of love 
and marriage, was identified almost a 
decade later® as a membrane-anchored 
protein that is required for female fertil- 
ity, sperm-egg membrane fusion, and 
egg binding by Izumol. One structure 
of mouse Juno has been published this 
year’, and another will soon be pub- 
lished in Nature Communications*. But 
structures of the extracellular domain 
of Izumol, human Juno and the 
Juno-Izumo1 complex have remained 
unknown. 

Juno was originally called folate 
receptor-6, and shares close to 60% 
amino-acid identity with human folate 
receptors’ (receptors for folic acid and 
its derivatives). The structures of Juno 
from mice”* and the current studies 
reveal that the protein has an almost 
identical fold to that of folate recep- 
tors””®: globular, stabilized by eight 
disulfide bonds (S-S) and with a deep, 
ligand-binding pocket. But several key 
amino-acid residues in Junos ligand- 
binding pocket differ from those of the folate 
receptors, consistent with the fact that Juno 
cannot bind folates*. 

Both groups find that the extracellular 
region of Izumol has two domains — a four- 
helix bundle at the protein’s amino termi- 
nus and an immunoglobulin-like domain at 
the carboxy terminus. The two domains are 
connected by a hinge region consisting of a 
B-hairpin structure with loops at either end 
that are anchored to the two folded domains 
by disulfide bonds. The researchers show that 
Izumo] and Juno form a high-affinity com- 
plex in a 1:1 ratio. A surface of Juno distant 
from the pocket binds the outside of the hinge 
and makes contacts with both Izumol 
domains (Fig. 1). 

Ohto and colleagues crystallized structures 
of free and Juno-bound Izumol in the same 
elongated conformation. By contrast, Aydin 
et al. report that Izumol alone adopts a 
boomerang-shaped conformation, in which 
the hinge is almost 40° more closed than that 
of Juno-bound Izumol. The authors validated 
the approximate shape using a technique 
known as small-angle X-ray scattering. This 
provides low-resolution structural informa- 
tion about the protein in solution, thereby 
avoiding potential conformational biases that 
can arise in X-ray crystallography owing to 


Izumol 
(Juno-bound) 


Figure 1 | Juno stabilizes the Izumol hinge. Aydin et al.' and 
Ohto et al.” have solved the structures of the human sperm protein 
Izumo and its egg receptor Juno. Izumo] is shown in ribbon form 
and Juno ina surface representation. Izumol consists of two folded 
domains on either side of a connecting hinge (orange). When 
Izumo1 is in its free state, the hinge is more flexible and may allow 
the protein to adopt more-bent conformations than when it is 
bound to Juno (possible conformation change indicated by black 
arrow). Juno binding stabilizes the hinge, fixing it in an elongated 
conformation. This might expose disulfide bonds (S-S; yellow) for 
disulfide-exchange reactions to promote Izumol dimerization and 
subsequent sperm-egg membrane fusion. 


crystal packing. These data indicate that the 
boomerang-shaped conformation is probably 
the predominant conformation of Izumol 
in solution. Moreover, although Juno binds 
to the outer hinge surface, the region most 
strongly stabilized by this binding seems to 
be inside the hinge. This suggests that the 
hinge can adopt different positions in Izumol 
alone, but that Juno fixes the conformation of 
Izumol by simultaneously binding to both 
domains. 

Although binding interfaces are typically 
the most evolutionarily conserved surfaces 
of proteins, the Izumo1-Juno interface is less 
conserved than the remainder of either pro- 
tein. Both groups suggest that variation at 
the binding surfaces might contribute to spe- 
cies specificity during fertilization, because 
sperm-egg fusions retain some specificity even 
if the zona pellucida (the main block to cross- 
species fertilization) is removed''. Ohto and 
colleagues introduced genetic mutations into 
mouse Izumo] that strongly reduced the affin- 
ity of the Izumol-Juno interaction. Expression 
of wild-type Izumo1 in monkey kidney cells 
(which do not normally express Izumo1) ena- 
bled these cells to bind efficiently to mouse 
eggs that lacked the zona pellucida, whereas 
cells that expressed the mutant protein could 
not. These results clearly confirm the interface 
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identified in these structures and its 
importance in mediating sperm-egg 
docking. 

Why would a protein-binding 
receptor evolve from a folate recep- 
tor? It is tempting to speculate that an 
unidentified, non-folate ligand might 
bind the pocket of Juno to modulate 
the receptor’s activity. Folate receptors 
are exquisitely pH-sensitive and release 
folic acid under acidic conditions”, and 
Ohto et al. demonstrated that slight 
acidification drastically decreased 
Juno’s affinity for Izumol1. Together, 
ligand binding and pH changes could 
enable Juno to regulate Izumo] binding 
at multiple levels. 

Although the interaction between 
Izumo] and Juno in sperm-egg recog- 
nition and adhesion has been structur- 
ally and biophysically characterized, 
the transition from initial binding to 
membrane fusion remains unclear. 
Izumol stays in the membrane fol- 
lowing binding, whereas Juno is shed. 
This shedding might rapidly block 
polyspermy before the slow harden- 
ing of the zona pellucida is completed*. 
Previous work” suggests that Izumol 
undergoes stable dimerization through 
a disulfide-exchange reaction, disso- 
ciating from Juno to enable recruit- 
ment of membrane-fusion machinery. 
Indeed, Ohto et al. provide evidence 
that the disulfide bonds in Izumol are 
easily broken — perhaps stabilization of 
Izumo] following Juno binding could expose 
disulfides for exchange. Testing this hypoth- 
esis and determining how Izumol-Juno 
binding triggers membrane fusion will require 
the identification of proteins that bind to 
Izumo] after Juno shedding, and the recon- 
stitution of events that follow initial binding 
in cells. m 
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Searching for the rules that govern 


hadron construction 


Matthew R. Shepherd!, Jozef J. Dudek?? & Ryan E. Mitchell! 


Just as quantum electrodynamics describes how electrons are bound in atoms by the electromagnetic force, mediated by 
the exchange of photons, quantum chromodynamics (QCD) describes how quarks are bound inside hadrons by the strong 
force, mediated by the exchange of gluons. QCD seems to allow hadrons constructed from increasingly many quarks to 
exist, just as atoms with increasing numbers of electrons exist, yet such complex constructions seemed, until recently, 
not to be present in nature. Here we describe advances in the spectroscopy of mesons that are refining our understanding 


of the rules for predicting hadron structure from QCD. 


hile decades of experimental study support QCD as the 
\ \ | underlying theory of quark interactions, a detailed under- 
standing of the way QCD generates protons, neutrons, and 
other strongly interacting ‘hadrons’ remains elusive. The majority of 
observed hadrons fall neatly into only two very limited sets: baryons, 
which are consistent with being three-quark constructions (qqq); and 
mesons, which are quark-antiquark (qq) constructions. QCD also appears 
to allow constructions featuring larger numbers of quarks as well as had- 
rons built not only from quarks, but also from gluons. This has raised the 
question of why, until possibly now, there has been no evidence for a 
spectrum of such hadrons. Have we just been historically unsuccessful in 
producing these exotic particles in the laboratory, or are there more 
restrictive rules for building hadrons that are not obvious from the 
unsolved equations of QCD? Here we choose to focus specifically on the 
spectrum of mesons, where timely developments in both theory and 
experiment can be used to illustrate how the field of hadron spectroscopy 
addresses fundamental questions about QCD, questions that are common 
to both the meson and baryon sectors. 


Interacting quarks and gluons in QCD 
Within QCD, the ‘charge’ that controls the interactions of quarks is known 
as ‘colour, and it was the study of the empirical spectrum of hadrons that 
first introduced the concept of quarks and their threefold colour charge. 
Interactions in QCD are symmetric under changes of colour, that is, no 
single colour of quark behaves differently from the other two, and impos- 
ing this symmetry on the theory uniquely defines the interactions allowed 
in QCD between the quarks and the force-carrying gluons. Coloured 
quarks can interact by emitting or absorbing gluons, and because they 
carry colour charge themselves, gluons can also emit and absorb gluons. 
Although observations about the spectrum of hadrons inspired the 
fundamental theory of quark interactions, calculating the detailed spec- 
trum from this theory has so far been impossible. The difficulties in 
these calculations stem from the presence of gluon-gluon interactions, 
which make QCD forces very strong on the distance scale of 10~!°m 
that characterizes hadrons. This ultimately results in a property called 
‘confinement, whereby quarks are permanently trapped inside composite 
hadrons, making it difficult to isolate the interaction of a single quark 
and antiquark from the collective behaviour of quarks and gluons in the 
hadron. The strong coupling means that, unlike for the electromagnetic 
force, where the exchange of two photons between electrons in an atom is 
far less probable than the exchange of just one, exchange of any number of 


gluons between quarks in a hadron is every bit as probable as exchanging 
one. Because of this, there is no simple method of calculating the net 
effect of interactions between two quarks, and a QCD calculation of the 
mass ofa hadron, easily measurable by experiment, becomes intractable. 


Understanding QCD via rules for building hadrons 

Our inability to solve the equations of QCD is not just a curiosity—it 
restricts our understanding of the behaviour and structure of hadrons, 
owing to the lack of any simple relationship between the fundamental 
quarks and gluons of QCD and the spectrum of hadrons observed 
experimentally. This has motivated the use of heuristic models, or 
‘rules, that serve as a bridge between QCD and experiment, capturing 
the important features of the spectrum while attempting to respect the 
known properties of QCD. The development of a rulebook for con- 
struction of hadrons consistent with both QCD and experimental data 
would arguably define what it means to understand how QCD generates 
hadrons. A uniform set of rules may not exist—there may be no simple 
way to capture the complex behaviour of QCD—but the high degree of 
regularity in the experimental spectrum of hadrons suggests that this is 
not a forlorn hope, and the search for this rulebook drives the field of 
hadron spectroscopy. 

An important area of exploration attempts to create previously unob- 
served classes of hadrons in the laboratory, such as quark-gluon hybrids 
or tetraquarks. From the pattern of such states, or their absence, we can 
refine our understanding of the rules of hadron construction. A second 
area develops techniques for calculating the observable properties of 
hadrons directly from QCD, which will indicate how the rules follow from 
the strong interactions of quarks and gluons prescribed by that theory. 
In what follows we will review the current developments in each of these 
two areas and discuss the prospects for achieving the goal of determining 
the rulebook for hadron construction. 


Rules inferred from experimental data 

We label hadrons by their mass and their quantum numbers J (spin), 
P (parity, behaviour under reflection in a mirror), and C (charge- 
conjugation, behaviour under exchange of particles with antiparticles). 
These properties are directly observable, but other characteristics, such as 
their internal composition, must be inferred. As the number of observed 
hadrons has increased over the last half-century, definite patterns have 
emerged that have led to an initial set of simple rules for the construction 
of hadrons from quarks. 


1Department of Physics, Indiana University, Bloomington, Indiana 47405, USA. Department of Physics, Old Dominion University, Norfolk, Virginia 23529, USA. Jefferson Lab, Newport News, 


Virginia 23606, USA. 


23 JUNE 2016 | VOL 534 | NATURE | 487 


© 2016 Macmillan Publishers Limited. All rights reserved 


REVIEW 


a4L 4S “ZL (4415) 

4.2L —_¥4160)_ — —2 

rl — (4040) 
S 38h = = = 
& — (3770) 
€ i 
g 3.6L 
oO 
= 

3.4L 

3.2L 

— Uy 
1S 
3.0 1. — 
O* 17> 2- 3-2 


—_— = — mm 
— 2P _ — |, 
ae Xe 
h 
—_ 41P Xo Xc2 
Xco 


4+ ott qt+ ott Btt Att gt 


JPc 


Figure 1 | The charmonium spectrum. A qq potential model calculation (coloured) of the charmonium spectrum is compared to experiment (black)*!. 
Columns indicate states of common J’°. Potential model states appear in groups labelled by their radial and orbital angular momentum quantum 


numbers, nL (coloured text; n= 1, 2, 3... L=S, P, D, E...). 


The quark-antiquark rule for constructing mesons 

One of the earliest patterns discovered (in the 1960s) was that mesons 
with the same J’© quantum numbers could be grouped into sets of nine 
(‘nonet’) having similar mass. This could be explained by combining a 
quark q with an antiquark g if there were three ‘flavours’ of quark—these 
were given the names ‘up; ‘down’ and ‘strange. The lightest nonet of 
mesons has J?© = 0~*, and there are heavier nonets with other J?° values. 
It was suggested that the additional mass-energy of the excited hadrons 
arises principally from the orbital or radial motion of the quark—antiquark 
(qq) pair, in analogy to the excitations of a single-electron atom. 

With the discovery of charmonium (in the 1970s)!, this quantum- 
mechanical picture became more precise—these new mesons with 
masses much larger than those observed earlier were explained as being 
bound states featuring a new, heavier quark, which was dubbed ‘charm. 
Charmonium mesons with a range of J’ values were observed and their 
spectrum (Fig. 1) resembles that of a pair of particles bound by a potential. 
The large mass of the charm quark justified such an approach, as many 
of the complexities of a relativistic system could be neglected. The poten- 
tial needed to describe the spectrum was novel, featuring a steady rise at 
large distances that would confine the quarks within the meson’. A fea- 
ture of this model of mesons is that it is not possible for a qq pair in any 
orbitally or radially excited state to have J” C inthe set0*~, 17+, 277,.... 
Sets of mesons with these ‘exotic’ quantum numbers were not convinc- 
ingly observed experimentally, either in charmonium or for the lighter 
quarks, supporting the qq picture. 

Until recently virtually all experimentally observed hadrons could have their 
presence explained by a simple rule stating that each meson is constructed 
from a qq pair, and each baryon from a three-quark configuration. However, 
it has never been at all obvious why QCD is so parsimonious—why are there 
not meson-like states of two quarks and two antiquarks (‘tetraquarks’)), or 
baryon-like states of four quarks and an antiquark (‘pentaquarks’)? 
Furthermore, since the gluons of QCD strongly interact just as quarks do, 
could we not have ‘hybrid mesons’ in which gluons bind to a qq pair, and 
‘glueballs’ that do not require quarks at all? Observation of hadrons like these 
would challenge the simple rule outlined above, and indeed, recent experi- 
mental results are casting doubt on how parsimonious QCD really is. 


Recent results challenge the qq rule 
A powerful way to study the meson spectrum is to collide high-energy 


beams of electrons and positrons and to observe the rate at which systems 
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of hadrons are produced. In this process, the ete~ pair first annihilates, 
producing a photon; the photon converts into a quark and antiquark, 
which then interact, exchanging gluons and perhaps creating more qq 
pairs; finally, these quarks and gluons arrange themselves into a system 
of hadrons that are observed by the particle detector. If the collision 
energy is close to the mass of a meson with J?°= 1~~ quantum numbers, 
the system ‘resonates, and the probability ofa collision increases. Thus, a 
plot of the normalized rate of hadron production, the ‘cross-section, 
against the ete~ centre-of-mass energy, shows peaks corresponding to 
the produced meson states, also known as ‘resonances’ (Fig. 2a). These 
excited states exist only briefly before decaying into the set of observed 
lighter hadrons, and the width of the peak is inversely related to the 
lifetime of the state. 

Figure 2a depicts the total rate of hadron production as a function of 
the eTe~ centre-of-mass energy. The peaks are interpreted as evidence for 
a series of excited states—the v(3770), (4040), ~)(4160) and 7(4415)— 
consistent with expectations from the qq picture (see Fig. 1). But recent 
experimental advances have allowed a closer inspection. If instead of the 
total rate, we look at the rates for the production of specific systems of 
hadrons, distinct features appear that have no simple explanation in the 
qq picture. 

The production rate of the r*J/w system, shown in Fig. 2b, provides 
one such example. (The J/7) is a hadron that, for historical reasons, has 
two names associated with it, J and w.) Here, a prominent peak appears 
at 4,260 MeV, which, surprisingly, lies between the masses of the y(4160) 
and 1(4415) states. Unlike the 7(4160) and v(4415), this Y(4260) reso- 
nance has no explanation within the qq picture. Another example is the 
production rate of the 77 system. The Y(4260) resonance might be 
expected also to appear here, since m*mJ/y and m*7 y are very similar 
systems, but it does not. Instead, two peaks appear, for Y(4360) and 
Y(4660) (Fig. 2c), in further disagreement with the spectrum suggested 
by the total cross-section. These Y states, which appear in addition 
to those expected within the qq picture, may be a signal that QCD 
does indeed produce mesons with internal structures beyond the 
simple q@ rule. 

The observation of new states in charmonium, which was previously 
believed to be well understood, has spurred searches for further exotic 
candidate states, observations of which are providing still more challenges 
for the simple qq rule. For example, a detailed study of the r* a J/1) 
system produced in Y(4260) decays showed that the 7*J/z system (Fig. 3a) 
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Figure 2 | Electron-positron annhiliation cross-sections. a, ete” —> 
hadrons (refs 62, 63). b, e'e- — 17 J/w (refs 4-6). c, eve ta yy 
(refs 65, 66). The 1~~ states (3770), (4040), v9(4160) and 7(4415), 
indicated in a, can be associated with the 1D, 3S, 2D and 4S states of the 
potential model of Fig. 1. The error bars represent combined statistical 
and systematic uncertainties, taken from the appropriate references. The 
enhancements observed in b and c do not line up with these states, which 
may indicate that they correspond to new hadron states that do not appear 
in the potential model and hence do not obey the gq rule. 1 nanobarn 
(nb) = 10777 m?. 


appears to resonate at a mass of 3,900 MeV, producing an electrically 
charged state labelled Z(3900)**. This structure is particularly noteworthy 
because its large mass and decay featuring J/x) suggest that it contains a 
charm quark and an anti-charm quark, while its net electric charge 
requires further light (up- and down-flavoured) quarks. It is thus a 
possible tetraquark. A pattern of such states is beginning to emerge 
around 4 GeV: for example, in the +h, system, also produced in ete” 
collisions, another electrically charged structure, Z(4020), appears in the 
mh, spectrum’ (Fig. 3b) with a somewhat larger mass. 

These new states can also, in principle, be produced in the weak decay of 
heavy mesons containing a bottom quark. Strangely, recent experimental 
data yields no evidence of Z(3900) production in such decays’. Instead, 
signals for still further new states of higher mass are observed*"!, 
A related process is the decay of heavy baryons containing a bottom 
quark, and here, equally as surprising, we find what appears to be a 
resonating proton-J/y system. This hadron is a possible pentaquark. 
Although the origin of these new states is not yet firmly established, they 
present a serious challenge to the simple rules for constructing mesons 
and baryons that we previously believed were obeyed by QCD. 

The pattern of conventional mesons nicely replicates itself for each fla- 
vour of quark: many structures that appear in the spectrum of light quarks 
(up, down, strange) reappear for charm quarks at the 3-GeV scale, and 
again for bottom quarks at the 10-GeV scale. One might also expect that 
any spectrum of hybrids, tetraquarks or other novel constructions should 
have recurrent patterns for different quark flavours. In fact, bottom-quark 
analogues of the charged tetraquark candidates in charmonium have been 
reported!, Historically, these observations preceded those in charmonium. 

Like tetraquarks and pentaquarks, another class of hadrons that appear 
to be allowed by the fundamental interactions of QCD are quark-gluon 
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The error bars represent statistical uncertainty. 


hybrids, in which gluons and quarks have a role in setting the quantum 
numbers of the hadron. A subset of possible hybrid mesons have a unique 
experimental signature: exotic J’ not accessible to a qq pair. While there 
are experimental indications of exotic hybrid candidates'*"!8, no firmly 
established spectrum of hybrid mesons has been discovered. 

In parallel to the experimental work discussed above, theoretical efforts 
are underway to understand whether QCD predicts the existence of 
hadrons which go beyond the qq meson and three-quark baryon rule, or 
whether the collective behaviour of quarks and gluons excludes the con- 
struction of more exotic combinations. It is to such calculations that our 
attention now turns. 


Rules derived from QCD 

Much of our understanding of hadrons is informed by models, which may 
be motivated by features of QCD, by empirical observations, or both. A 
goal is to develop an understanding that is based on rigorous calculations 
of the interaction of quarks and gluons through the equations of QCD. 
However, the strongly coupled nature of QCD makes techniques that are 
practical for calculating weak and electromagnetic interactions ineffective 
for predicting properties of hadrons that emerge from QCD. We need a 
different approach, one that utilizes the fact that all fundamental parti- 
cles, including quarks and gluons in QCD, are more correctly thought 
of as fluctuating quantum fields. The quantum aspect of the theory is 
embodied in the fact that observable consequences follow from a sum 
over all possible configurations in space and time that these fields can 
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take. The method known as lattice QCD makes the approximation of 
considering these fields on a discrete grid of points describing a restricted 
region of space-time. A finite, but large, number of possible configura- 
tions of the fields on this grid can be generated using random sampling on 
a computer, and a good approximation for observable hadron properties 
obtained. The volume of the grid and number of field configurations 
required to achieve useful precision demands substantial computational 
resources. Total computational times of several teraflop-years are not 
unusual for contemporary calculations, with such efforts making use 
of ‘leadership-class’ supercomputing facilities—future precision lattice 
QCD calculations of increased sophistication will require petaflop-scale 
machines. 

Lattice QCD has been applied with substantial success to a broad range 
of processes involving hadrons’, including the spectrum and internal 
structure of the lightest hadrons, the behaviour of hadrons at non-zero 
temperature, relevant in collisions of heavy ions?!, and heavy flavour 
decays, in which a heavy quark confined inside a meson decays through 


the weak interaction”. 


Lattice QCD as a tool for hadron spectroscopy 

Our interest is in the determination of properties of excited hadrons, 
where obtaining a high degree of numerical precision is an issue that is 
secondary to the more basic question of whether certain states exist or do 
not. In the past few years we have seen excellent progress in overcoming 
the challenges posed by these calculations. Exploration of the excited 
hadron spectrum is possible using an approach in which each state in the 
spectrum is produced by a different combination of quark and gluon field 
constructions, and for this method to be successful, a large set of possible 
constructions is required. The dynamics of QCD, implemented by the 
sum over possible field configurations, determines which combination 
of constructions is present in each state in the spectrum. A scheme out- 
lined in refs 23 and 24 includes many constructions resembling qq pairs 
with various orbital motions and radial wavefunctions, motivated by the 
success of the qq rule in describing the experimental hadron spectrum. 
More elaborate structures are possible, though, and refs 23 and 24 
included several that feature the gluon field in a non-trivial way, inspired 
by the possibility that hybrid mesons may be allowed by QCD. 

This large set of constructions, coupled with advances in computa- 
tional techniques”, and the application of state-of-the-art computing 
hardware*®*”’, led to the pioneering results presented in Fig. 4 for the 
spectrum of mesons constructed from light up and down quarks. The 
computational challenges of these calculations currently require the 
utilization of masses for the lightest quarks that are heavier than the phys- 
ical up and down quark masses, which leads to a systematic shift in the 
computed meson masses. However, since the immediate goal is to under- 
stand the underlying QCD dynamics by studying the pattern of states, 
rather than precisely to predict the mass of each meson, the computed 
spectrum allows us to develop intuitive rules for constructing hadrons 
that generally apply for quarks of any mass. 
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Figure 4 | Lattice QCD computation of the 
ia) meson spectrum. The spectrum is computed 
with light-quark masses such that m, = 392 
MeV (ref. 67). The spectrum features sets of 
states compatible with the nL assignments of a 
qq model (see Fig. 1), but also (shown in blue) 
states that do not have a place in such a model. 
These states can be interpreted as hybrid 
mesons in which a qq pair is partnered with an 
excitation of the gluon field**—their presence 
suggests a new rule of hadron construction 
that includes gluons. (The height of each box 
represents the estimated uncertainty in the 
calculation.) 
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The spectrum presented in Fig. 4 qualitatively reproduces many of the 
features of the experimental light meson spectrum, and further it reflects 
the simple picture of gq mesons, with the bulk of the states fitting into the 
pattern expected for states excited with increasing amounts of orbital 
angular momentum and/or excitations in the radial quantum number. 
There are some notable exceptions to this pattern, however, in particular 
the 0-*, 1~~ and 2~* states between 2.1 GeV and 2.4GeV do not have an 
obvious explanation, and most strikingly there is a clear spectrum of states 
with exotic J?° = 17*, 0+ and 2+, which cannot be constructed from a 
4g pair alone. 

These additional mesons, which go beyond the set predicted by the qq 
rule, have a natural explanation as quark-gluon hybrid mesons. 
Previously, estimates for the spectrum of hybrid mesons came only from 
models, which made educated guesses for the behaviour of the strongly 
coupled gluons inside a hadron. Different guesses led to very different 
predictions for the number and mass of hybrid states”***, Using the lattice 
QCD technique, we are now able to predict a definitive pattern of states 
directly from the fundamental interactions as prescribed by QCD. Further 
calculations*>-*’, performed with larger values of the quark mass, up as 
high as the charm quark mass, show the same pattern of hybrid mesons, 
and they are found to be consistently 1.3 GeV heavier than the lightest 
J?©=1~ meson. The particular pattern of states and the simple mass gap 
leads to a new rule of hadron construction for hybrid mesons, namely: 
combine qq constructions with a gluonic field that has J?°=1*~ anda 
mass of about 1.3 GeV to form the spectrum of hybrid mesons in QCD. 
This is the first example of a rule following from a QCD calculation rather 
than being inferred from experimental observations*®. 

Of course this rule must be verified by producing and studying hybrid 
mesons in the laboratory, and many current and near-future experiments 
include searches for these states in their programmes. Some hybrid meson 
candidates have already been observed experimentally in both the light 
meson sector!*-!8 and in the charm region. For example, the Y(4260) 
discussed in the previous section has J’° = 1~~, approximately the right 
mass relative to the J/7), and it seems to appear in addition to the expected 
qg excitations. The new rule of hybrid meson construction would have 
this meson partnered with states of J?°=(0, 1, 2)~* at a similar mass. 
Searches for these states are underway. 


Calculating how hadrons decay 
These calculations of the excited meson spectrum within QCD represent 
a major step forward in our understanding of hadron spectroscopy, but 
they still make approximations that fail to capture an important feature 
of excited hadrons—that they are resonances, decaying rapidly to lighter 
hadrons. As can be seen in Figs 2 and 3, in simple cases, excited states 
appear as characteristic peaks in the rate of observation of certain final- 
state mesons, and lattice QCD calculations should be capable of repro- 
ducing this behaviour. 

Experimentally, resonances are often observed to decay preferentially 
into certain sets of mesons and not others, and these patterns can be used 
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Figure 5 | Calculation of the p resonance. The cross-section (in arbitrary 
units) is shown for 77 — m7 with 1~~ quantum numbers, calculated using 
lattice QCD with light-quark masses such that m,; = 236 MeV (ref. 68). 
The p resonance is clearly observed as a peak, and from its position and 
width the mass and decay rate of this excited state can be extracted. 

The errors represent the uncertainty in the calculation. 


to infer details of the resonant state's internal structure. To be able to 
calculate the decay properties of the excited hadrons directly from QCD 
would provide a powerful tool for interpreting experimental data. In the 
case of predictions of previously unobserved excitations, it may also pro- 
vide suggested decay channels to be examined in experimental searches. 

Extending the calculations described above to account correctly for the 
decay of excited states is possible*?“° but challenging, and serious efforts 
have only recently begun*”-**. As an example of what can be achieved, 
in Fig. 5 we present the cross-section for two pions forming the lightest 
1—~ resonance, known as the p, and then decaying back into two pions. 
A clear peak is observed, whose position and width provide the mass and 
decay rate of the p. 

These rapidly maturing theoretical techniques will be required to 
study the new charmonium mesons, discussed earlier, within QCD. 
The observed enhancements are seen only in specific final states, which 
implies that the ability to predict how hadrons decay directly from QCD 
will be an essential component in interpreting experimental data in the 
quest to develop the rules for constructing hadrons. 


Towards a unified set of rules 

Much of what we know about what emerges from strongly coupled QCD 
has come from studying patterns of hadrons organized by mass and 
quantum numbers like J, P and C. These patterns suggest quarks of 
several flavours which may be combined with a single antiquark to form 
mesons—a rather simple rule of hadron construction. The theory of 
QCD is not limited to such simple constructions, however, and making 
a definitive statement about the existence of mesons with four-quark or 
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quark-gluon hybrid structure will require observing a spectrum of addi- 
tional mesons that cannot be explained by the qq rule. In particular, we 
will need to observe a set of states with unusual flavour and/or J?° values. 


Finding a pattern of hadrons is essential 

Contemporary technology has enabled experimental investigations at 
an unprecedented level of statistical precision, which provides the capa- 
bility to discover more rare and interesting phenomena. However, we 
must exercise great care when we attempt to interpret experimental 
data. For example, one needs to be certain that the same logic that allows 
one to deduce the presence of conventional charmonium mesons in 
the total hadronic cross-section also applies when one is examining the 
cross-section for a single exclusive process that is two orders of magnitude 
smaller (see, for example, Fig. 2). Such precise experimental data make 
one susceptible to effects that can mimic the experimental signature of 
a new hadron, but which in fact may have a more prosaic origin®->”. 
This underscores the importance of experimentally establishing a pat- 
tern of hadrons: the interpretation of any single state as a new and exotic 
construction will certainly be questioned. However, the experimental 
observation of an ordered spectrum of states is harder to dismiss as a 
misinterpreted experimental artefact. 

Likewise, theoretical efforts in lattice QCD must continue in their 
attempts to compute the complete set of possible hadrons allowed by 
QCD, and to identify patterns of states within that spectrum. Recent 
advances have enabled us to develop a simple rule, stated in subsection 
‘Lattice QCD as a tool for hadron spectroscopy, that describes how QCD 
constructs hybrid mesons and baryons, in an extension of what we had 
already for conventional mesons and baryons—this new rule must be 
verified by observing an experimental spectrum of hybrids. Lattice QCD 
can also be used to calculate decay properties of hadrons, and identifying 
particular characteristic decays of hybrid mesons will guide experimental 
searches and aid in interpretations of data. As has been done with hybrids, 
lattice QCD needs to determine whether QCD predicts a spectrum of 
tetraquark and pentaquark states. A particular priority is in the heavy 
quark sectors, where, as we have discussed above, there is recent experi- 
mental evidence for such objects. The ability within lattice QCD to vary 
arbitrarily the mass of the quarks allows us to identify how the rules of 
hadron construction vary, and to identify possible common behaviours 
between the heavy charmonium system and the lighter mesons. 


A global experimental programme 

Establishing a spectrum of hadrons beyond those described by the simple 
qq and qqq rules will require the combined efforts of multiple present and 
future experiments. There is a spectroscopy programme within nearly 
every particle physics collaboration worldwide. We list the details of a 
selection of several past, present, and future experiments, primarily those 
whose work is referenced in this article, in Box 1, as an illustration of the 
breadth of the worldwide effort. 


BOX | 
Hadron spectroscopy experiments 


A selection of experiments and their hadron spectroscopy programmes, which typically represent only a fraction of each collaboration’s research efforts. 
BaBar (Menlo Park, California, USA): ete collisions at bottomonium energies; discoveries of the Y(4260) and Y(4360); finished collecting data in 2008. 


Belle (Tsukuba, Japan): ete™ collisions at bottomonium energies; discovery of the X(3872), Z(3900), Z(4430), and Z, states; finished collecting data in 2010. 
Belle II (Tsukuba, Japan): an upcoming continuation of the Belle experiment that will provide much higher intensity e*e~ collisions than achieved at Belle. 
BESIII (Beijing, China): ete™ collisions at charmonium energies; direct production of the Y(4260); discovery of the 2(3900) and Z(4020); ongoing. 

COMPASS (Geneva, Switzerland): high-intensity meson beams on nuclear targets; searches for unusual light-quark mesons; discovery of the a;(1420); ongoing. 
GlueX (Newport News, Virginia, USA): polarized photon beam on a nuclear target; searches for light-quark hybrid mesons; data collection is beginning now. 


LHCb (Geneva, Switzerland): high-energy, high-intensity proton—proton collisions, specializing in B-meson decays; measurement of resonant nature of the 
Z(4430); discovery of pentaquark candidates; ongoing. 


PANDA (Darmstadt, Germany): proton-antiproton collisions at charmonium energies; exploration of charmonium and light-quark mesons; upcoming. 
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Most of the recently observed new hadrons have so far been observed 
in only a single production or decay process. Observation of the same 
state in multiple production and decay modes almost certainly rules out 
a misinterpretation of experimental data due to some process-dependent 
phenomenon and solidifies the evidence for a new hadron. Therefore the 
best current routes to explore new states in the charm sector are by com- 
paring results from e*e collisions (BESIII, Belle, Belle II) and production 
in B-meson decay (LHCb, Belle, Belle II). Supplementing these with 
results from novel production mechanisms, such as proton-antiproton 
annihilation (PANDA), would be extremely valuable. 

Experiments aimed at exploring different energy regimes and quark 
flavours are essential for a complete understanding of the meson 
spectrum, as we expect the underlying patterns of states to be independent 
of quark mass. A variety of present and future experiments will allow 
access to both the charmonium system (BESIII, Belle, Belle Il, LHCb), and 
the analogous system of bottom quarks, bottomonium (Belle, Belle I). 
Mesons constructed from light quarks can be produced in decays of 
heavier mesons and therefore can be studied at all of the previously men- 
tioned facilities; they can also be produced at experiments dedicated to 
the study of lighter systems (COMPASS, GlueX). Discovery of light-quark 
hybrids would suggest the existence of heavy-quark hybrids and further 
motivate dedicated searches for these states. 

With continued coordinated experimental and theoretical investiga- 
tions we hope to define a complete set of rules for building hadrons that 
both describes what is observed in nature and can be derived directly 
from QCD. In doing so, we aim to understand how what seems to be a 
simple spectrum of hadrons emerges from the complex interactions of 
quarks and gluons in QCD. 
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Selective spider toxins reveal a role for 
the Nayl.1 channel in mechanical pain 


Jeremiah D. Osteen!, Volker Herzig’, John Gilchrist*, Joshua J. Emrick!, Chuchu Zhang!, Xidao Wang*, Joel Castro*®, 
Sonia Garcia-Caraballo*®, Luke Grundy”, Grigori Y. Rychkov®, Andy D. Weyer’, Zoltan Dekan?, Eivind A. B. Undheim?, 
Paul Alewood?, Cheryl L. Stucky’, Stuart M. Brierley*®, Allan I. Basbaum‘, Frank Bosmans’, Glenn F. King? & David Julius! 


Voltage- gated sodium (Na,) channels initiate action potentials in most neurons, including primary afferent nerve fibres 
of the pain pathway. Local anaesthetics block pain through non-specific actions at all Nay channels, but the discovery 
of selective modulators would facilitate the analysis of individual subtypes of these channels and their contributions 
to chemical, mechanical, or thermal pain. Here we identify and characterize spider (Heteroscodra maculata) toxins 
that selectively activate the Nay1.1 subtype, the role of which in nociception and pain has not been elucidated. We use 
these probes to show that Na,yl.1-expressing fibres are modality- specific nociceptors: their activation elicits robust pain 
behaviours without neurogenic inflammation and produces profound hypersensitivity to mechanical, but not thermal, 
stimuli. In the gut, high-threshold mechanosensitive fibres also express Nayl.1 and show enhanced toxin sensitivity in a 
mouse model of irritable bowel syndrome. Together, these findings establish an unexpected role for Nay1.1 channels in 
regulating the excitability of sensory nerve fibres that mediate mechanical pain. 


Pain is a multimodal system in which the activation of functionally 
distinct sensory nerve fibres elicits acute, protective reflexes as well 
as maladaptive responses that contribute to persistent pain’. In these 
nociceptive neurons, three voltage-gated sodium (Na,) channels— 
Nayl.7, Nayl.8 and Na,1.9—have garnered particular attention because 
mutations affecting these subtypes are associated with insensitivity 
to pain or persistent pain syndromes”*. Nayl.1 is also expressed by 
somatosensory neurons’~!°, but no link has been established between 
this subtype and nociception". Mutations affecting Nayl.1 are associ- 
ated with CNS disorders such as epilepsy!23, autism!4, and Alzheimer 
disease’, and these clinically dominant phenotypes may have masked 
functions of this subtype in peripheral neurons. For example, gain- 
of-function mutations in Nay1.1 underlie familial hemiplegic migraine 
type 3 (ref. 16), and although this phenotype has been ascribed to a 
CNS- initiated mechanism”, dysfunction in sensory neurons may also 
contribute to this pain syndrome. 

Another challenge for identifying roles for Nay1.1 in pain is devel- 
oping subtype-selective drugs for any member of this highly con- 
served family of ion channels'*®. Natural products can be exploited 
as a source of evolutionarily honed agents that target receptors with 
exquisite specificity. Such agents may be found in complex venoms 
from spiders, scorpions, cone snails, and snakes; they include toxins 
that excite sensory nociceptors to elicit pain or discomfort in offend- 
ing predators'®°. Here we describe two algogenic tarantula toxins 
that selectively activate Nay1.1 to elicit acute pain and mechanical 
allodynia, providing new insights into specific roles of this channel 
and Na,1.1-expressing sensory nerve fibres in nociception and pain 
hypersensitivity. 


Selective Nayl.1-activating toxins 
To identify novel toxins that target nociceptors, we used calcium imag- 
ing to screen more than 100 spider, scorpion and centipede venoms 


for the ability to activate cultured somatosensory neurons. Venom 
from the tarantula Heteroscodra maculata (Fig. 1a) robustly excited a 
subset of neurons from the trigeminal or dorsal root ganglia (DRG) of 
mice or rats. Venom fractionation yielded two active peptides, which 
were identified by matrix-assisted laser desorption/ionization-time- 
of-flight mass spectroscopy (MALDI-TOF MS) and Edman sequenc- 
ing as inhibitor cystine knot (ICK) peptides with related sequences 
(Extended Data Fig. la). We named these toxins 6-theraphotoxin- 
Hmla (Hmla) and 6-theraphotoxin-Hm1b (Hm1b). Application 
of synthetic Hm1la to rat DRG neurons likewise triggered calcium 
responses (Fig. 1b), validating Hm1a as an active venom component. 
All subsequent experiments were performed with synthetic Hmla 
peptide unless otherwise stated. 

Tetrodotoxin (TTX) blocked Hm1a-evoked calcium responses 
(Fig. 1b), suggesting that these responses involved Na, channels. 
Indeed, whole-cell patch-clamp recordings from trigeminal neurons 
showed that Hm1a robustly inhibited Na, current inactivation (Fig. 1c). 
Among the Na, subtypes expressed by these neurons, only Nay1.1, 1.6 
and 1.7 are sensitive to TTX2!, narrowing our search. We next tested 
ICA-121431, a small molecule inhibitor with selectivity for the Nayl.1 
and Nayl.3 subtypes”? (Extended Data Fig. 1b), and found that it 
greatly diminished Hm1a-evoked calcium responses in both embry- 
onic DRG and postnatal day (P)0 mouse trigeminal cultures (Fig. 1d 
and Extended Data Fig. Ic, d), suggesting that Na,1.1 is the main target 
of Hm1a in somatosensory neurons. In contrast, ICA-121431 only 
partially blocked responses to SGTx1, an Hm1a-related peptide that 
shows little selectivity among Nay subtypes”? and excites a larger cohort 
of sensory neurons compared to Hm1a (Extended Data Fig. Ic, d). 
To confirm that the toxin was selective for Na,l.1 channels, we het- 
erologously expressed Nayl.1-Nayl.8 « subunits in Xenopus oocytes. 
Hm_la potently inhibited inactivation of human (h)Nayl.1 channels 
(half-maximum effective concentration (ECs9) = 38 +6 nM), but had 
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Figure 3 | Na,1.1 is expressed by myelinated, non-C-fibre neurons 

in sensory ganglia. a, Representative DRG sections showing 
immunoreactivity for neurofilament 200 (NF200), binding of isolectin B4 
(IB4), and in situ histochemistry for TRPV1, Nay1.7, or Nayl.1 transcripts, 
as indicated. Arrows and asterisk indicate cells with overlapping and non- 
overlapping signals, respectively. b, Size distribution for all DRG neurons 
(grey bars, 514 cells counted) or Nay1.1-expressing cells (black bars, 324 
cells counted). c, Quantification of overlap between histological markers 


channel, which is normally insensitive to the toxin (Extended Data 
Fig. 2a). Transfer of just the DIV S3b-S4 region rendered rK,2.1 sen- 
sitive to Hmla, demonstrating that this segment is a primary deter- 
minant of toxin action (Fig. 2b and Extended Data Fig. 2b). However, 
this region is identical or highly conserved in hNay1.1, hNayl.2 and 
hNay,1.3, and thus cannot fully account for toxin selectivity. To iden- 
tify other functionally important regions, we constructed chimaeras 
between Nayl.1 and Nay,1.4, which is completely insensitive to Hm1a. 
Replacement of the S3b-S4 region of rNay1.4 with that of hNay1.1 
did not confer toxin sensitivity, whereas transfer of both S3b-S4 and 
the S1-S2 loop resulted in full toxin sensitivity (Fig. 2c and Extended 
Data Fig. 2c—e). These results indicate that both domains together 
determine toxin sensitivity and subtype selectivity, consistent with 
previous suggestions that $1-S2 contributes to toxin recognition sites 
on voltage sensors”””®, 


Na,1.1 is found on myelinated Aé fibres 

Using in situ hybridization histochemistry, we found that Na,1.1 tran- 
scripts were expressed primarily by medium-diameter sensory neu- 
rons (constituting 35% ofall neurons within the DRG), most of which 
(>75%) belong to the myelinated (NF200-positive) cohort (Fig. 3). In 
contrast, we observed limited (5-11%) overlap of Nay1.1-positive cells 
with markers of small diameter, unmyelinated neurons, including the 
transient receptor potential cation channel subfamily V member 1 
(TRPV1), calcitonin gene-related peptide (CGRP), tyrosine hydroxy- 
lase, and the lectin IB4. However, we did see substantial co-expression 
with the 5-HT; receptor, a marker of lightly myelinated Aé neurons”? 
(43% of Nay1.1-positive cells expressed 5-HT). Finally, 22% of Nayl.1- 
positive cells also expressed the cold/menthol receptor TRPMB8, which 
is found in both C and A§$ fibres’. From these findings, we conclude 
that Na,1.1 is expressed primarily by myelinated neurons, including 
Aé fibres, consistent with previous histological and transcriptome 
profiling data”?! We also characterized Hm1a-sensitive neurons for 


496 | NATURE | VOL 534 | 23 JUNE 2016 


s 
o 
is) 


Neurons (%) 


[1 Total 
Mm Na,1.1* 
20 
: Lu 
0 ee aoe 
S S 


OO 9, DOBDOOH GOO O 
SEDO OLD EDAGED CDLOARD GOH Oe 
SEE SESLE RES ESS 


Cross-sectional area (\1m?) 


» 


[= Marker*/Na,1.1* 
@ Na,1.1*/Marker* 


Fraction co-labelled © 
Oo) 225. (2. 3S 
oS po ££ oO @ 
> F 


© ob WA & x 
SPV BV KK AN EY WW SK 
SNF PRLS 
d 
N - Vehicle # 
30 
Pal 
e 20 # x 
50 mN g t 
8 ig 
© 
~~ 10 
50 mN * "0 25 50 75 100 


Force (mN) 
(>164 cells counted for each condition; 9-12 independent sections 
from >3 mice). d, Representative traces from mechanonociceptive A 
fibres recorded in skin-nerve preparation show increased firing following 
application of Hm1a (111M) with quantification on the right. Hm1a markedly 
increased firing during all forces tested, achieving statistical significance 
at 50 and 100 mN (***P < 0.001 with two-way ANOVA, #P < 0.05 with 
Bonferonni post hoc test; n = 23, 23 and 18 fibres for vehicle and 13, 13 
and 10 fibres for Hm1a at 15, 50 and 100 mN forces, respectively). 


responses to other receptor-selective agonists (Extended Data Fig. 3b), 
further confirming this conclusion. Notably, most (>85%) Na,1.1- 
positive cells also expressed Nay1.7, suggesting that this population of 
myelinated neurons contributes to nociception (see below). 

We next investigated the effect of Hm1la on mechanonociceptive 
A fibres using the ex vivo skin-nerve preparation. We found that 
application of 14M Hm1a to cutaneous receptive fields significantly 
increased the firing rate in mechanonociceptive A fibres in response 
to mechanical stimuli (Fig. 3d), confirming expression of functional 
Nay1.1 channels in this afferent population. Previous studies found 
limited expression of TRPV1 in mechanonociceptive A6 fibres*?, 
consistent with our finding that there was limited overlap between 
Nayl.1 and TRPV1 expression. Together, these functional data con- 
firm our histological assignment of Na,1.1 expression to myelinated 
Aé fibres, and further suggest that Nay1.1 participates in mechanical 
nociception. 


Hmla elicits pain and mechanical hypersensitivity 

We investigated whether activation of Nayl.1-expressing fibres pro- 
duced pain behaviours. Injection of Hm1a (511M in 10) into the 
mouse hind-paw elicited immediate and robust nocifensive responses 
(bouts of licking or biting of the injected paw) throughout the obser- 
vation period (Fig. 4a). Toxin injection also significantly increased 
Fos immunoreactivity in dorsal horn neurons of the superficial lam- 
ina ipsilateral to the injection, signifying functional engagement of 
nociceptors and their central connections (Fig. 4b). To exclude the 
possibility that this response depends on the small population of fibres 
that co-express TRPV1 and Nay1.1, we ablated TRPV1-positive ter- 
minals by intrathecal (spinal) injection of capsaicin**; Hmla-evoked 
nocifensive behaviour persisted in these mice (Fig. 4a). Notably, Hmla 
did not produce swelling or plasma extravasation of the injected paw, 
a neurogenic inflammatory response readily provoked by activation of 
peptidergic C-fibre nociceptors that include most TRPV 1-expressing 
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Figure 4 | Hm1a elicits non-inflammatory pain and bilateral 
mechanical allodynia. a, Comparison of licking or biting behaviour 
(nocifensive behav.) following intraplantar injection (10 1) of vehicle 
(veh., PBS) (n =6) versus Hm1la (511M) (n= 10, **P < 0.01). Behaviour 
was unaffected by ablation of TRPV1 fibres (V1abl, n= 5) but significantly 
reduced in peripherin-Cre x floxed-Nayl.1 (CKO) mice (*P < 0.05, 
n=11).b, Top, representative histological sections and quantification of 
c-Fos immunoreactivity in spinal cord dorsal horn following intraplantar 
vehicle or Hmla (51M) injection (n= 27 sections from three mice, 

*** P< 0.001). c, Capsaicin- or Hm1la-injected paws (right) next to 
uninjected contralateral controls (left). Top right, relative thickness of 
injected versus uninjected paws. Bottom right, Evans blue dye (EBD) 
extravasation following capsaicin or Hm1a injection (*P < 0.05). 


neurons (Fig. 4c). These results further suggest that Hm 1a elicits pain 
by activating a non-peptidergic subset of myelinated sensory fibres. 

Genetic or pharmacological elimination of TRPV 1-expressing fibres 
greatly diminishes sensitivity to noxious heat, but does not perturb 
sensitivity to mechanical stimuli***>. In light of the anatomical and 
physiological results described above, we tested whether Hm1a has 
differential effects on these modalities by monitoring responses to 
thermal and mechanical stimuli following intraplantar injection of 
toxin at a dose (500nM in 1011) insufficient to elicit acute behaviour. 
Intraplantar injection of Hm1a did not alter sensitivity to heat, but 
produced robust sensitization to mechanical stimulation that was 
not dependent on TRPV1-expressing fibres (Fig. 4d, e). Equivalent 
mechanical sensitization was also observed following injection of 
native Hm1b peptide (Fig. 4e). Consistent with these behavioural 
observations, we found that all Hm1a-responsive adult DRG neurons 
displayed mechanically activated currents, except for those neurons 
that were also sensitive to capsaicin (Fig. 4f). 

To confirm that Nayl.1 is required for toxin-evoked behaviours, we 
crossed mice bearing a floxed Na,1.1 allele!’ to a line expressing Cre 
recombinase under the control of the peripherin promoter, which is 
active in a large percentage of unmyelinated and myelinated sensory 
neurons during development**. Analysis of a peripherin-Cre x yellow 
fluorescent protein (YFP) reporter line showed that these mice 
expressed Cre recombinase in 46% of Na,1.1-positive cells (Extended 
Data Fig. 4a, b). Notably, elimination of Nay1.1 from this subset of 
fibres significantly attenuated toxin-evoked behaviours, including both 
acute nocifensive responses and mechanical sensitization (Fig. 4a, e). 

Robust activation of nociceptive pathways by nerve injury or inflam- 
mation can trigger both primary and secondary sensitization, the latter 
of which can manifest as mechanical or heat hypersensitivity con- 
tralateral to the insult?”**. In fact, we found that unilateral injection 


d, Latency of paw withdrawal (WD) from noxious heat stimulus measured 
after intraplantar injection of vehicle or Hm1a (500 nM). e, Normalized 
mechanical response thresholds measured in paws ipsilateral (light grey) 
or contralateral (dark grey) to vehicle or toxin (500 nM) injection 

(n=5 for wild type (WT) Veh., Vlabl Hmla and WT Hm1b; »=7 for WT 
Hmla; n=9 for CKO Hmla; **P < 0.01, ***P< 0.001, ****P < 0.0001). 
f, Mechanically evoked currents were observed from all adult mouse 

DRG neurons exhibiting sensitivity to Hm1a but not capsaicin (bottom), 
and not from those sensitive to both (top) (stimulus range 1-9 pm 
displacement). Kinetic properties of mechanically evoked currents in 
Hm1la responders were variable. Error bars represent mean + s.e.m. 

P values based on unpaired two-tailed Student’s t-test (b, c) or one-way 
ANOVA with post hoc Tukey’s test (a, d, e). 


of Hm1a produced robust and equivalent mechanical sensitization of 
both the injected and contralateral paws (Fig. 4e). This contralateral 
sensitization was also modality-specific, as no change in heat sensitiv- 
ity was observed (Fig. 4d). Importantly, Hmla-mediated mechanical 
sensitivity was equivalently reduced in the ipsilateral and contralateral 
paws of Nayl.1-peripherin Cre mice, demonstrating that the contra- 
lateral effects depend on Na,1.1 (Fig. 4e). As we did not observe signs 
of neurogenic inflammation, we investigated whether this phenotype 
resulted from Hm1a-mediated nerve injury. However, this seems 
unlikely because toxin injection failed to induce expression of ATF3, 
a marker of nerve damage* (Extended Data Fig. 4c). Together, these 
observations demonstrate that direct activation of Nayl.1-expressing 
fibres is sufficient to produce robust and modality-specific bilateral 
sensitization. 


Nayl.1 and irritable bowel syndrome 

Chronic mechanical hypersensitivity underlies the development of 
abdominal pain in patients with irritable bowel syndrome (IBS)*”. 
Given the apparent role of Nayl.1 in mechanonociception, we 
investigated whether this channel is expressed by mechanically 
sensitive fibres of the gut, and, if so, whether it contributes to neu- 
ronal sensitization in a model of chronic visceral hypersensitivity 
(CVH)*". We examined mechanical responses in ex vivo gut-nerve 
preparations from healthy and CVH mice. In preparations from 
healthy mice, Hm1a increased mechanically evoked spiking in a sub- 
population (40%) of high-threshold colonic afferents that constitute 
presumptive mechanonociceptors (Fig. 5a and Extended Data Fig. 5a). 
Correspondingly, ICA-121431 reduced mechanical responses in 50% 
of fibres examined and blocked Hm1a-induced sensitization (Fig. 5a 
and Extended Data Fig. 5a). Moreover, Hm1a significantly reduced the 
threshold for action potential firing in a subset (45%) of retrogradely 
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Figure 5 | Colonic afferents display increased sensitivity to Hmla ina 
model of IBS. a, Left, representative ex vivo single fibre recording from an 
Hm1la (100 nM)-responsive high-threshold mechanoreceptive fibre from a 
healthy control mouse (arrows indicate application and removal of 2g Von 
Frey hair stimulus). Middle, group data from Hm 1a-sensitive fibres (6 out 
of 15, defined as >15% increase over baseline; ** P< 0.01; see Extended 
Data Fig. 5 for examples and group data from Hm1a-nonresponsive 
fibres). Right, group data from a population (5 out of 10) of ICA-121432 
(500 nM)-sensitive afferents (****P < 0.0001). b, Left, representative 
whole-cell current clamp recording of a retrogradely traced colonic DRG 
neuron in response to 500 ms current injection at rheobase (the minimum 
current injection required to elicit action potential firing). Recordings 
were from the same neuron of a healthy control mouse before and after 
incubation with Hm1a (10nM). Scale bars, 250 ms, 20 mV. Middle, group 
data show significant reduction in rheobase following Hm1a application 
in a sub-population (5 out of 11) of neurons (*P < 0.05). Hm1a-responsive 
neuron defined as exhibiting >10% change in rheobase from baseline 
control. Right, Hm1a increased the number of action potentials observed 
at 2 x rheobase in these neurons, but not to a level that reached statistical 
significance. c, Left, responses from high-threshold colonic fibres from 
CVH mice before and after application of Hm1a (100 nM). Middle, group 
data from Hm1la-responsive fibres (4 out of 11, ***P < 0.001). Right, 
group data from ICA-121432-sensitive fibres (7 out of 10, **P<0.01). 

d, Left, representative Hm1la-responsive colonic DRG neuron from CVH 
mice in whole-cell current clamp. Addition of Hm1a reduced rheobase 
(top traces) and increased action potential firing at 2 x rheobase (bottom 
traces). Middle and right, group data from Hm1a-responsive CVH 
neurons (7 out of 11) showing toxin-mediated decrease in rheobase 
(middle, ***P < 0.001) or increase in action potential firing at 2 x rheobase 
(right, *P < 0.05, **P < 0.01). Error bars represent mean + s.e.m. P values 
based on paired Student’s t-test (a-d, middle) or one-way ANOVA with 
post hoc Bonferroni test (a—d, right). 
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traced colonic DRG neurons, as measured by whole-cell current clamp 
analysis (Fig. 5b and Extended Data Fig. 5b). These results demon- 
strate that a subset of high-threshold mechanosensitive colonic fibres 
express functional Na,1.1 channels. 

In colonic afferents from CVH mice, baseline mechanosensory 
responses were elevated compared to healthy controls (compare Fig. 5a 
and c). Application of Hm1la enhanced mechanically evoked spiking in 
a subset (36%) of CVH fibres beyond this already elevated level (Fig. 5c). 
Notably, in the context of CVH (and in contrast to normal controls), 
toxin application markedly increased the electrical excitability of most 
(64%) retrogradely traced colonic DRG neurons (Fig. 5d), suggesting 
that Na,1.1 channels were functionally upregulated. Furthermore, 
ICA-121431 reduced mechanosensory responses in most (70%) CVH 
sensitized fibres to levels resembling those of baseline controls (com- 
pare Fig. 5a and c) and blocked the sensitizing effects of Hm1a (Fig. 5c). 
Together, these results support the idea that Na,1.1 is involved in 
mechanical hypersensitivity in IBS. 


Concluding remarks 

The development of Na, channel subtype-selective ligands is an 
important but challenging goal. Our results identify a site within the 
DIV S1-S2 loop that enhances subtype selectivity, providing a poten- 
tial strategy for designing other subtype-specific gating modifiers. 
Moreover, toxins such as Hmla and Hm1b, which alter inactivation, 
may be of particular utility in boosting Nay channel activity where 
partial loss-of-function has been linked to developmental or neuro- 
degenerative disorders, such as autism, Alzheimer disease and Dravet 
syndrome!*!>*, Analysis of toxin-channel interactions, including the 
multi-site nature of this pharmacophore, may shed new light on strate- 
gies for developing a broader class of molecules with similar selectivity 
and functional profiles. 

The critical role of Nayl.1 in the brain may have prevented its 
prior recognition as a contributor to peripheral pain signalling. Our 
results now unambiguously implicate Nayl.1 and Nayl.1-expressing 
myelinated afferents in nociception. Activation or sensitization of 
these fibres is sufficient to elicit robust acute pain and mechanical 
allodynia without triggering neurogenic inflammation, distinguishing 
these fibres from well-characterized C-nociceptors. Previous studies 
have implicated myelinated A6 fibres in mechanonociception, and 
Na,1.1 now provides an important new marker with which to more 
precisely identify the contribution of these fibres to acute and chronic 
pain. 

Our experiments in CVH mice suggest that pharmacological block- 
ade of Nay1.1 represents a novel therapeutic strategy for diminishing 
the chronic pain in IBS, and perhaps other pain conditions associ- 
ated with mechanical sensitization, including migraine headache. 
Although Nay1.1 activity in the brain may underlie aura in patients 
with type 3 familial hemiplegic migraine (FHM3)", our results sug- 
gest that these gain-of-function mutations may also produce migraine 
pain through actions of Nayl.1 in mechanical nociceptors. In fact, 
anticonvulsants that target Nay channels, including Na,1.1, have been 
shown to reduce migraine attacks in some individuals*>4°, Moreover, 
rufinamide, an anticonvulsant that was recently shown to inhibit 
Nayl.1 (ref. 47), has also been reported to diminish nerve-injury- 
evoked mechanical allodynia*®. Our findings provide a mechanistic 
rationale for these actions, and motivate further analysis of the roles of 
Nayl.1 and Nayl.1-expressing nociceptors in acute and persistent pain. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Venom collection and screening. Venom from spiders, scorpions and centipedes 
was collected by mild electrical stimulation of the chelicerae, telson or forcipules, 
respectively. Venom samples were then lyophilized and kept frozen until use. 
Approximately 100 venoms were tested by ratiometric calcium imaging using a 
standard inverted microscope setup. Responses to high extracellular potassium 
(150 mM), capsaicin (1|1M), or previously characterized venoms or purified 
toxins!? were used to validate the health and robustness of sensory neuron cultures 
used in screening assays. Responses were digitized and analysed using MetaMorph 
software (Molecular Devices). Venom-evoked responses that were stimulus-locked, 
visually detectable above background, and restricted to neurons (that is, did not 
cause calcium entry into glia or fibroblasts) were selected for further analysis. 
Pharmacological analysis was used to narrow down potential targets and crude 
venoms or purified fractions were subsequently tested on cloned candidate 
channels. Candidates were taken forward based on the robustness of the response 
and evidence for selectivity at novel targets. See Supplementary Information for a 
summary of venoms that produced no detectable or specific response in our hands. 
Hm1la/b isolation. Venom from H. maculata (1 mg dried) was fractionated on 
a C18 reversed-phase (RP) high-performance liquid chromatography (HPLC) 
column (Jupiter 250 x 4.6mm, 5mm; Phenomenex) on a Shimadzu Prominence 
HPLC system. The following linear gradient of solvent B (90% acetonitrile 
(MeCN), 0.1% formic acid in water) in solvent A (0.1% formic acid in water) was 
used at a flow rate of 1 ml min~!: 5% B for 5 min, then 5-20% B for 5 min followed 
by 20-40% B over 40 min. Absorbance was measured at 214nM and 280 nM and 
collected fractions were lyophilized before storage at —20°C. 

Mass spectrometry. Peptide masses were determined by MALDI-TOF MS using 
a 4700 Proteomics Bioanalyzer model (Applied Biosystems). Peptides were dis- 
solved in water and mixed 1:1 (v/v) with a-cyano-4-hydroxycinnamic acid matrix 
(7mg ml ' in 50% MeCN, 5% formic acid) and mass spectra acquired in positive 
reflector mode. All reported masses are for the monoisotopic M+Ht ions. 
Sequence determination. N-terminal sequencing was performed by the Australian 
Proteome Analysis Facility. In brief, Hm1a (600 pmol) and Hm1b (250 pmol) were 
reconstituted and reduced by addition of DTT (25 mM) followed by incubation at 
56°C for 0.5h. The samples were then alkylated using iodoacetamide (55 mM) at 
room temperature for 0.5h and purified by RP-HPLC using a Zorbax 300SB-C18 
column (3 x 150mm). The target peaks of interest were identified, collected and 
then reduced to minimal volume under vacuum. The entire sample was loaded 
onto a precycled, Biobrene-treated disc and subjected to 37 (Hm1a) or 42 (Hm1b) 
cycles of Edman N-terminal sequencing. Automated Edman degradation was 
carried out using an Applied Biosystems 494 Procise Protein Sequencing System. 

Edman degradation of Hmla yielded ECRYLFGGCSSTSDCCKHLSCRSDW 
KYCAWDGTEFE as the sequence, which has a calculated monoisotopic mass 
(for the M+H* ion) of 3,908.58 Da. This is 86.97 Da short of the monoisotopic 
mass of Hm1la of 3,995.55 Da. Hence, we concluded that a serine residue (87 Da) 
was missing from the C-terminal end of Hm1a to give a complete sequence 
of ECRYLFGGCSSTSDCCKHLSCRSDWKYCAWDGTES. The complete 
sequence has a calculated monoisotopic mass (for the M+-H™ ion) of 3,995.61 Da, 
which is only 0.06 Da different from the mass that was measured for the native 
Hmla. 

Edman degradation of Hm1b yielded ECRYLFGGCKTTADCCKHLGCRIDLY 
YCAWDGT as the sequence, which has a calculated monoisotopic mass (for the 
M+H? ion) of 3,745.6 Da. This is 147 Da short of the monoisotopic mass of 
Hm1la of 3,892.60 Da. We therefore concluded that an amidated phenylalanine 
was missing from the C-terminal end of Hm1b to give a complete sequence of 
ECRYLFGGCKTTADCCKHLGCRTDLY YCAWDGTF-NH). To confirm that the 
C terminus of Hm1b is amidated, we digested native Hm1b with carboxypeptidase 
Y (CPY) and monitored the reaction by MALDI-TOF MS to identify the mass of 
the C-terminal residue as described previously*”. Native Hm1b (511 of 800 ng il‘) 
in 100mM ammonium acetate, pH 5.5, was incubated with 2 ng yl? CPY at 37°C 
for 20 min. The reaction was monitored by removing 0.41] at 0, 1, 5, 10 and 
20 min and spotting it on a MALDI plate with equal volume of 7 mg ml! «-cyano- 
4-hydroxycinnamic acid in 60% (v/v) MeCN, 5% formic acid (FA). Dried spots 
were washed with 1011 1% FA and allowed to dry before they were analysed 
by MALDI-TOF MS on a 4700 Proteomics Bioanalyser (Applied Biosciences), 
acquiring spectra in reflector positive mode. The first CPY-mediated cleavage 
yielded a mass difference of 146 Da, which corresponds to an amidated pheny- 
lanine residue. Thus, the complete sequence has a calculated monoisotopic mass 
(for the M+H* ion) of 3892.64 Da, matching native Hm1b. 

Hmla synthesis. Solvents for RP-HPLC consisted of 0.05% TFA/H2O (solvent A) 
and 90% MeCN/0.043% trifluoroacetic acid (TFA)/H2O (solvent B). Analytical 
HPLC was performed on a Shimadzu LC20AT system using a Thermo Hypersil 
GOLD 2.1 x 100mm C18 column heated at 40°C with a flow rate of 0.3 ml min“. 
A gradient of 10 to 55% B over 30 min was used, with detection at 214nm. 


Preparative HPLC was performed on a Vydac 218TP1022 column running at a 
flow rate of 16ml min! using a gradient of 10 to 50% solvent B over 40 min. Mass 
spectrometry was performed on an API2000 (ABI Sciex) mass spectrometer in 
positive ion mode. All reagents were obtained commercially and were used without 
further purification. 

Hm la was synthesized using regioselective disulfide-bond formation 
The peptide was assembled on a 0.1-mmol scale using a Symphony (Protein 
Technologies Inc.) automated peptide synthesizer and a H-Ser(tBu)-2-ClTrt (loading 
0.69 mmol g!) polystyrene resin. Couplings were performed in dimethylformamide 
(DME) using 5 equivalents of Fmoc-amino acid/(2-(1H-benzotriazol-1-yl)-1,1,3,3- 
tetramethyluronium hexafluorophosphate (HBTU)/N, N-diisopropylethylamine 
(DIEA) (1:1:1) relative to resin loading for 2 x 20 min. Fmoc deprotection was 
achieved using 30% piperidine/DMF (1 x 1.5 min, then 1 x 4min). Non-cysteine 
amino acid side-chains were protected as Asp(OtBu), Arg(Pbf), Glu(OtBu), 
His(Trt), Lys(Boc), Ser(tBu), Thr(tBu), Trp(Boc) and Tyr(tBu). The cysteine 
side chains were protected as Cys2,Cys16(Meb), Cys9,Cys21(Dpm), and 
Cys15,Cys28(Trt). Cleavage from the resin was achieved by treatment with 10% 
acetic acid/10% trifluoroethanol (TFE)/dichloromethane (DCM) at room tem- 
perature for 1h. The product was precipitated and washed with n-hexane then 
lyophilized from 1,4-dioxane/MeCN/H,0. 

The first disulfide bond (Cys15—Cys28) was formed by dissolving the crude 
product in in HFIP (5 ml) and adding it dropwise to a stirred solution of I; (4 equiv- 
alents) in 10% 1,1,1,3,3,3-hexafluoropropan-2-ol (HFIP)/DCM (20 ml) over 5 min. 
Stirring was continued for a further 5 min then the solution was poured into a 
solution of ascorbic acid/sodium acetate in HO. The aqueous phase was extracted 
with DCM, and the combined organic layers washed with water. Following removal 
of solvent under reduced pressure, the product was lyophilized from 1,4-dioxane/ 
MeCN/H,O. Electrospray ionization mass spectrometry (ESI-MS) (m/z): calc. 
(avg) 2,159.4 [M+3H]**, found 2,159.7. 

The remaining side-chain-protecting groups (except Cys(Meb)) were removed 
by treatment with 95% TFA/2.5% triisopropylsilane (TIPS)/2.5% HO at room tem- 
perature for 2h. After most of the cleavage solution was evaporated under a stream 
of N2, the product was precipitated and washed with cold Et,O and lyophilized 
from 50% MeCN/0.1% TFA/HO to give Cys2,Cys16(Meb), Cys9,Cys21(SH), 
Cys15-Cys28(SS) Hmla (280 mg). ESI-MS (m/z): calc. (avg) 1,404.3 [M+3H]**, 
found 1,404.1. 

The second disulfide bond (Cys9-Cys21) was formed by dissolving the crude 
product from the previous step in 30% DMSO/0.1 M HCl (0.5mgml~!) and 
stirring at room temperature for 24h. Cys2,16(Meb), Cys9-Cys21(SS), Cys15- 
Cys28(SS) Hm1a was then isolated by preparative HPLC (30 mg). ESI-MS (m/z): 
calc. (avg) 1,403.6 [M+3H]**, found 1,403.3. 

Formation of the third disulfide bond (Cys2—Cys16) was then achieved by 
first removing the Cys(Meb) groups by treatment with HF/p-cresol (9:1) at 0°C 
for 1h. The product was precipitated and washed with cold Et,O and lyophilized 
from 50% MeCN/0.1% TFA/H20 yielding Cys2,16(SH), Cys9-Cys21(SS), Cys15- 
Cys28(SS) Hm1a (24mg). ESI-MS (m/z): calc. (avg) 1,334.1 [M+3H]**, found 
1,333.7. Oxidation of the liberated thiols was performed using 30% DMSO/0.1M 
HClas described for the second disulfide bond to yield fully oxidised Hm1a (3 mg) 
that was indistinguishable by analytical HPLC from an authentic sample. ESI-MS 
(m/z): calc. (avg) 1,333.5 [M+3H]?*, found 1,333.1. 

Na, and K, channel constructs. Human (h)Na,1.4, hNa,1.5, and rat (r)K,2.1 
were provided by P. Ruben, C. Ahern, and K. Swartz, respectively. hNa,1.1-1.3, 
hNay1.6-1.8 were obtained from Origene Technologies, Inc. Accession num- 
bers are NM_001165963.1 (hNay1.1), NM_021007.2 (hNay1.2), NM_006922.3 
(hNa,1.3),; NM_000334 (hNay1.4), NM_198056 (hNayl.5), NM_014191.2 
(hNa,1.6), NM_002977.2 (hNa,1.7), and NM_006514.3 (hNa,1.8). Channel 
chimaeras were generated using sequential PCR with rNay1.4 (gift from 
B. Chanda), K,2.1A7 (refs 54,55), hNayl.1, and hNay1.9 (ref. 24) (Origene 
Technologies: NM_014139.2) as templates. Mouse K,4.1 was obtained from 
Addgene and originated in the laboratory of L. Salkoff. The K,2.1A7 construct 
contains seven point mutations in the outer vestibule that render the channel sensi- 
tive to agitoxin-2, a pore-blocking scorpion toxin®*. cRNA of all constructs was syn- 
thesized using T3 or T7 polymerase (mMessage mMachine kit, Life Technologies) 
after linearizing the fully-sequenced DNA with appropriate restriction enzymes. 
Xenopus oocytes. Channels and chimaeras were expressed in Xenopus laevis 
oocytes (animals acquired from Xenopus one) that were incubated at 17°C 
in Barth’s medium (88 mM NaCl, 1 mM KCI, 0.33 mM Ca(NO3)2, 0.41 mM 
CaCh, 0.82 mM MgSOu,, 2.4mM NaHCO;, 5mM HEPES, and 0.1 mg ml! 
gentamycin; pH 7.6 with NaOH) for 1-4 days after CRNA injection, and then 
were studied using two-electrode voltage-clamp recording techniques (OC-725C; 
Warner Instruments or GeneClamp 500B; Axon Instruments) with a 150-1] record- 
ing chamber or a small volume (<20 11) oocyte perfusion chamber (AutoMate 
Scientific), Data were filtered at 4kHz and digitized at 20 kHz using pClamp 10 
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software (Molecular Devices). Microelectrode resistances were 0.5—1 MQ when 
filled with 3 M KCI. For K,2.1 and K,2.1 chimaera experiments, the external 
recording solution contained (in mM): 50 KCl, 50 NaCl, 5 HEPES, 1 MgCl, and 
0.3 CaCh, pH 7.6 with NaOH. For Na, and K,4.1 experiments, the external record- 
ing solution contained (in mM): 100 NaCl, 5 HEPES, 1 MgCl and 1.8 CaCh, 
pH 7.6 with NaOH. For Nay channel experiments, the external recording solution 
contained (in mM): 100 NaCl, 5 HEPES, 1 MgCl and 1.8 CaCl, pH 7.6 with 
NaOH. All experiments were performed at room temperature (~22 °C) and toxin 
samples were diluted in recording solution with 0.1% BSA. Leak and background 
conductance, identified by blocking the channel with agitoxin-2 or TTX, were 
subtracted for K, or Na, channel currents, respectively. Voltage-activation relation- 
ships were obtained by measuring tail currents for K, channels, or by monitoring 
steady-state currents and calculating conductance for Nay channels. Occupancy 
of closed or resting channels by toxins was examined using negative holding volt- 
ages where open probability was low, and the fraction of unbound channels was 
estimated using depolarizations that are too weak to open toxin-bound channels. 
After addition of toxin to the recording chamber, equilibration between toxin and 
channel was monitored using weak depolarizations elicited at 5-10-s intervals. 
For all channels, voltage-activation relationships were recorded in the absence 
and presence of toxin. Off-line data analysis was performed using Clampfit 10 
(Molecular Devices) and Origin 7.5 (Originlab). 

Multiple protocols were used to probe the biophysical characteristics of the 

Nay channels and chimaeras studied. To determine conductance-voltage and 
steady-state inactivation relationships, oocytes expressing Nay channels were 
held at —90 mV and depolarized in 5-mV steps from —90 mV to 5mV for 50 ms, 
immediately followed by a step to —15 mV to elicit the maximum available current 
and after 50 ms, returned to the —90 mV holding potential. Peak current gen- 
erated during the incremental portion of the protocol was used to calculate the 
conductance-voltage relationship while the peak current during the —15mV step 
as a function of the earlier voltage step was used to determine the steady-state 
inactivation relationship. The time constant of fast inactivation was determined 
by fitting single exponential curves to the —15 mV step of the aforementioned 
protocol. Boltzmann curves were fitted in Clampfit 10 (Molecular Devices) and 
statistics calculated with Excel or the R statistical package (Student's t-test). 
Cultured neurons. Whole-cell patch clamp of cultured mouse trigeminal neu- 
rons was performed as described*”. Buffer solution contained (in mM) 150 NaCl, 
2.8 KCI, 1 MgSOy, 10 HEPES, pH 7.4 with NaOH and was perfused with or without 
toxins/drugs using a SmartSquirt Micro-Perfusion system (AutoMate). For colonic 
DRG, whole-cell recordings were made from fluorescently labelled thoracolumbar 
(T10-L1) colonic DRG neurons 20-48 h after plating, using fire-polished glass 
electrodes with a resistance of 2-5 MQ). All recordings were performed at room tem- 
perature (20-22 °C). Signals were amplified by using an Axopatch 200A amplifier, 
digitized with a Digidata 1322A and recorded using pCLAMP 9 software (Molecular 
Devices). For all DRG neurons the holding potential was —70 mV. In current clamp 
mode a series of depolarizing pulses (500 ms, 10 pA step) were applied from hold- 
ing potential (—70 mV) and the rheobase (amount of current (pA) required for 
action potential generation) determined. The number of action potentials at 2 x 
rheobase was also determined. Depolarizing pulses were tested in normal external 
bath solution and following the addition of Hm1a (100 nM). Control solutions and 
Hmla were applied with a gravity-driven multi-barrel perfusion system positioned 
within 1 mm of the neuron under investigation. Intracellular solutions contained 
(in mM): 135 KCl; 2 MgCl; 2 MgATP; 5 EGTA-Na; 10 HEPES-Na; adjusted to 
pH 7.4. Extracellular solutions contained (in mM): 140 NaCl; 4 KCl; 2 MgCh; 
2 CaCl; 10 HEPES-Na; 5 glucose; adjusted to pH 7.4. 
Skin-nerve recordings. To assess primary afferent activity in response to the Hmla 
spider toxin, we used the ex vivo skin-nerve preparation, as previously described”. 
Briefly, animals were lightly anaesthetized with inhaled isoflurane and then killed 
by cervical dislocation. The hair on the lower extremities was shaved, and the 
hairy skin of the hindpaw was then quickly dissected along with its innervating 
saphenous nerve. The skin and nerve were then placed in a recording chamber 
filled with warmed (32°C), oxygenated buffer consisting of (in mM): 123 NaCl, 
3.5 KCl, 2.0 CaCl, 1.7 NaH»POg, 0.7 MgSOu, 9.5 sodium gluconate, 5.5 glucose, 
7.5 sucrose and 10 HEPES titrated to a pH of 7.45 + 0.05. 

The saphenous nerve was then threaded into a mineral oil-filled chamber, 
teased apart atop an elevated mirror plate, and placed on an extracellular record- 
ing electrode. Single-unit receptive fields were then identified with a mechani- 
cal search stimulus using a blunt glass probe. Aé afferents were identified based 
on a conduction velocity between 1.2 and 10ms_!, and were subtyped into 
A-mechanonociceptors based on their slow adaptation to a mechanical stimulus”. 

After locating an A-mechanonociceptor fibre, to determine the threshold force 
for action potential generation, the receptive field was stimulated with calibrated 
Von Frey filaments. A metal moat (inner diameter: 4.7 mm) was then placed over 
the centre of the receptive field to isolate it from the surrounding buffer. Buffer 
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within the moat was then evacuated and replaced with a buffer containing either 
1,M Hm_la or vehicle (buffer). Receptive fields were incubated with toxin or 
buffer for 2-5 min. A custom-built, feedback-controlled mechanical stimulator 
was then placed within the moat and the receptive field was mechanically stim- 
ulated with a series of increasing forces (15 mN, 50mN and 100 mN) for 10s 
each. To avoid sensitization/desensitization, a rest period of 1 min was introduced 
between stimulations. 

Data were digitized using a PowerLab A/D converter and recorded using 
LabChart software and Spike Histogram extension (AD Instruments). All skin- 
nerve data were recorded and analysed with the experimenter blinded to whether 
toxin or vehicle was used. Recordings were included in the final data set only if the 
firing of the fibre was clearly distinguishable from both background noise and any 
other fibres firing during stimulation. 

Gut-nerve recordings. Ex vivo single-unit extracellular recordings of action 
potential discharge were made from splanchnic colonic afferents. Recordings were 
made from healthy or CVH mice using standard protocols. Baseline mechano- 
sensitivity was determined in response to application of a 2-g Von Frey hair probe 
to the afferent receptive field for 3 s. This process was repeated 3-4 times, separated 
each time by 10s. Mechanosensitivity was then re-tested after the application of 
Hmla (100nM) or the Nay1.1 blocker ICA-121432 (500 nM) or a combination 
of both ICA-121432 (500nM) and Hmla (100 nM). Instantaneous frequency is 
defined as the inverse of the time interval between an action potential and the pre- 
vious action potential. After application of Hm1a, significant increases (P < 0.05) 
in mechanically evoked firing were seen overall in both healthy and CVH fibres. 
However, in both conditions we clearly recorded Hm1a-responsive and Hm1la- 
non-responsive neurons. We therefore binned fibres by responsiveness and present 
these data separately for clarity (Fig. 5 and Extended Data Fig. 5). Group data are 
presented as spikes per second and are expressed as mean +s.e.m. 

Animal use, husbandry and genotyping. Mice were bred and housed in accord- 
ance with UCSF Institutional Animal Care Committee (IACUC) guidelines. 
Animals were housed in groups of 2-5 with constant access to food and water. 
Floxed Scn1a mice’? were provided by W. Catterall. Floxed mice were bred to 
peripherin—Cre (Per—Cre) mice** to produce Scnla*'? x Per-Cre conditional 
knockout mice. Na,1.1 floxed alleles were detected using primers previously 
described’? and Per-Cre expression was detected using the following primers 
to Cre recombinase: Cre_F: TAGCGTTCGAACGCACTGATTTCG; Cre_R: 
CGCCGTAAATCAATCGATGAGTTG. 

Somatic behavioural experiments were approved by UCSF IACUC and were in 
accordance with the National Institutes of Health (NIH) Guide of the Care and Use 
of Laboratory Animals and the recommendation of the International Association 
for the Study of Pain. Animals used in skin-nerve recordings were naive C57BL/6 
male mice (n= 10), aged 6-16 weeks. Mice were housed on a 14:10h light:dark 
cycle with ad libitum access to food and water in a climate-controlled room. All 
protocols were approved by the Institutional Animal Care and Use Committee at 
the Medical College of Wisconsin. Animals used in colonic afferent and colonic 
DRG neuron studies were male C57BL/6J mice. The Animal Ethics Committees of 
The University of Adelaide and the South Australian Health and Medical Research 
Institute (SAHMRI) approved experiments involving animals. 

Sensory neuron culture and calcium imaging. Trigeminal ganglia were dissected 
from newborn (P0-P3) Sprague-Dawley rats or C57BL/6 mice and cultured for 
>12h before calcium imaging or electrophysiological recording. Embryonic 
DRG cultures were provided by J. Chan®’. Embryonic cultures were maintained 
as described and calcium imaging experiments were performed 1-10 d after pri- 
mary cultures were established. Primary cells were plated onto cover slips coated 
with poly-t-lysine (Sigma) and laminin (Invitrogen. 10,.gml~ 1). Cells were loaded 
for calcium imaging with Fura-2-AM (Molecular Probes) for >1h. Buffer solution 
((in mM), 150 NaCl, 2.8 KCl, 1 MgSO4, 10 HEPES, pH 7.4 with NaOH) was per- 
fused with or without toxins/drugs using a SmartSquirt Micro-Perfusion system 
(AutoMate). 

In situ hybridization and immunohistochemistry. In situ hybridization (ISH) was 
performed using ViewRNA ISH Tissue 2-plex or 1-plex Assay Kits (Affymetrix). 
Target mRNA signals appear as puncta in bright field or fluorescent microscopy. 
Eight-to-twelve-week-old mice were deeply anaesthetized with pentobarbital then 
transcardially perfused with 10 ml PBS followed by 10 ml 10% neutral buffered 
formalin (NBF). DRGs were dissected, post-fixed in 10% NBF at 4°C overnight, 
cryoprotected in PBS with 30% (w/v) sucrose overnight at 4°C, then embedded 
in OCT compound at —20°C. Tissue was sectioned at 121m, thaw-captured on 
Diamond White Glass slides (Globe Scientific), and stored at —20°C until use. 
Slides were used within 2 weeks of processing to produce optimal signals. 

ViewRNA ISH Tissue 2-plex assay was performed with frozen tissue modifications 
as indicated by the manufacturer including the endogenous alkaline phosphatase 
inactivation by incubation in H2O with 0.1 M HCl and 300 mM NaCl. The 
haematoxylin and eosin counterstaining procedure was omitted. Images were 
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acquired with a Leica DMRB microscope and DFC500 digital camera using Leica 
Application Suite v3.5.0 then analysed further using ImageJ software. 

To co-label neuronal subpopulations markers (NF200, IB4, CGRP, TH) and 
Nayl.1 mRNA, ViewRNA ISH Tissue 1-plex assay and immunohistochemistry 
(IHC) were performed sequentially using a protocol modified from ref. 64. ISH/ 
IHC was not found to be compatible with all primary antibodies. Animals, tissue, 
and slides were prepared as described in the preceding paragraph. Frozen slides 
with tissue sections were warmed in a vacuum oven for 10 min at 60°C, fixed in 
PBS with 4% (v/v) formaldehyde for 10 min at room temperature and then pro- 
cessed according to the manufacturer's protocol with frozen tissue modifications 
in a ThermoBrite Slide Processing System (Abbott Molecular). Washing steps were 
performed as indicated, in a deliberate and vigorous manner. Optimal protease 
and probe incubation times were determined to be 12 min and 2h, respectively. 
Following development in Fast Red Substrate, slides were rinsed briefly in PBS then 
immediately processed for IHC. Slides were incubated for 1h at room temperature 
in a blocking solution consisting of PBS with 0.1% (v/v) Triton X-100 (Sigma) and 
10% normal goat serum (NGS). Slides were then incubated in primary antibody 
solution (PBS with 0.1% Triton X-100 and 2.5% NGS) overnight at 4°C, vigorously 
agitated for 2 min in fresh PBS three times, then incubated in secondary anti- 
body solution (PBS with 0.1% (v/v) Triton X-100) for 2h at room temperature in 
the dark. Sections were then washed by vigorous agitation for 2 min in fresh PBS 
three times before mounting with ProLong Gold antifade reagent with DAPI (Life 
Technologies) and addition of coverslips. Images were acquired with a Leica DMRB 
microscope and DFC500 digital camera using Leica Application Suite v3.5.0 then 
further analysed using ImageJ software. 

Affymetrix was commissioned to design a Type 1 probe set to mouse 
Nay1.1 (Scn1a, nM_018733.2) and Type 6 probe sets to mouse TRPV1 (Trpv1, 
nM_001001445.2), mouse Nayl.7 (Scn9a, nM_001290674.1), mouse 5HT3 
(Htr3a, nM_001099644.1), and mouse TRPMB8 (Trpm8, nM_134252.3) coding 
regions. We used the following primary antibodies: mouse anti- NF200 (1:10,000, 
Sigma), rabbit anti-CGRP (1:10,000, Peninsula Labs), and rabbit anti-TH 
(1:5,000, AbCam). We used fluorophore-conjugated secondary antibodies raised 
in goat against mouse or rabbit, as appropriate (1:1,000, Alexa Fluor 488, Life 
Technologies). To identify IB4-binding cells, biotinylated IB4 (1:1,000, Vector 
Labs) and fluorophore-conjugated streptavidin (1:1,000, Alexa Fluor 488, Life 
Technologies) were used in place of primary and secondary antibodies. Fos staining 
was performed 90 min after hindpaw injection of Hm1a or PBS. Spinal cord sec- 
tions were prepared from lumbar L4/L5 and stained with rabbit anti-Fos (1:5,000, 
CalBiochem). ATF3 antibody (Santa Cruz Biotechnology) was used at 1:2,000. 
Statistics and experimental design. Sample sizes for cellular physiology, histology 
and animal behaviour were chosen based on previous experience with these assays 
as the minimum number of independent observations required for statistically sig- 
nificant results. No statistical methods were used to predetermine sample size. For 
histology, at least three sections from each of at least three animals were counted. 
For oocyte and mouse neuron experiments, multiple batches or litters were used 
for all experiments. For behavioural experiments, animals were randomly chosen 
for different experimental cohorts by a blinded experimenter. Experimental and 
control conditions were compared within the same experimental time-course 
using randomly selected animals from one or multiple cages. Responses were then 
scored by an experimenter blinded to injection condition and experimental cohort. 
Animal genotype was tracked by ear tags and genotype unblinding occurred after 
analysis was complete. 

Data were analysed using Prism 6 software (GraphPad Software) and sig- 
nificance testing used either Student’s t-tests or one-way ANOVA followed by 
Bonferroni or Tukey’s post hoc tests, as noted in figure legends. All significance 
tests are two-sided. The number of experiments () and significance are reported 
in the figure legends. All significance tests were justified as appropriate given the 
experimental design and nature of the comparisons. We assume equal variance and 
normally distributed data within experimental paradigms where comparisons are 
made. These are common assumptions relied upon for significance testing within 
these experimental paradigms as previously published by our group and others. 
Behaviour. For behavioural experiments in Fig. 4, adult mice (6-12 weeks) were 
used and randomly selected for analysis. Male and female mice were first consid- 
ered separately in hindpaw nocifensive response experiments. Both sexes showed 
significantly greater responses to toxin in wild-type littermate versus Nay1.1°* x 
Per-Cre CKO mice (one-sided unpaired Student's t-test, P< 0.05, wild-type female: 
n=5, CKO female: n= 6, wild-type male: n =5, CKO male: n=5). Therefore, male 
and female behavioural responses were pooled and subsequent experiments were 
performed on both male and female mice for CKO and wild-type littermate exper- 
iments, or only male mice for other conditions (for example, capsaicin ablation). 
Nocifensive responses were recorded during a 20-min observation period imme- 
diately following intraplantar injections (1011 PBS with or without 51M Hmla). 
Licking/biting behaviour was scored as seconds of behaviour with the experimenter 


blinded to injection condition and experimental cohort (wild-type, CKO or cap- 
saicin-ablated mice). Hargreaves and Von Frey tests were performed 30 min after 
intraplantar injection of 500nM Hmla or Hm1b to measure heat and mechanical 
sensitivity, respectively. Intrathecal (i.t.) capsaicin ablation was performed as pre- 
viously described*?, and i.t. capsaicin-treated mice were tested on a hot plate to 
ensure ablation of TRPV1* afferents. Ablation was also confirmed by histology. 
Model of chronic visceral hypersensitivity. Colitis was induced by administra- 
tion of 2,4,6-trinitrobenzenesulfonic acid (TNBS) as described previously®h®, 
Briefly, 13-week-old anaesthetized mice were administered an intra-colonic 
enema of 0.1 ml TNBS (130,g ml! in 30% ethanol) via a polyethylene 
catheter®!°*°, Histological examination of mucosal architecture, cellular infiltrate, 
crypt abscesses, and goblet cell depletion confirmed significant TNBS-induced 
damage by day 3 post-treatment, which largely recovered by day 7, and fully recov- 
ered by 28 days. High-threshold nociceptors from mice at the 28-day time point 
displayed significant mechanical hypersensitivity, lower mechanical activation 
thresholds, and hyperalgesia and allodynia‘. As such, they are termed chronic 
visceral hypersensitivity (CVH) mice®?®*>°, 

Retrograde tracing and cell culture of colonic DRG neurons. Healthy and CVH 
mice of 16 weeks of age were anaesthetized with halothane and following midline 
laparotomy, three 10-11 injections of the fluorescent retrograde neuronal tracer 
cholera toxin subunit B conjugated to Alexa Fluor 488 were made sub-serosally 
within the wall of the descending colon. Four days after injection, mice were killed 
by CO, inhalation and DRGs from T10-L1 were surgically removed. DRGs were 
digested with 4 mg ml! collagenase II (GIBCO, Invitrogen) and 4mg ml! dispase 
(GIBCO) for 30 min at 37°C, followed by 4mg ml! collagenase II for 10 min at 
37°C. Neurons were mechanically dissociated into a single-cell suspension via 
trituration through fire-polished Pasteur pipettes. Neurons were resuspended in 
DMEM (GIBCO) containing 10% FCS (Invitrogen), 2mM L-glutamine (GIBCO), 
100,tM MEM non-essential amino acids (GIBCO) and 100 mg ml! penicillin/ 
streptomycin (Invitrogen). Neurons were spot-plated on 8-mm HCl]-treated 
coverslips coated with poly-p-lysine (800j1gml-') and laminin (20j1g ml~!) and 
maintained in an incubator at 37 °C in 5% CO3, 
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Extended Data Figure 1 | See next page for caption. 
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Extended Data Figure 1 | Hm1la and Hm1b selectively target Nay1.1 in 
sensory neurons. a, Left, HPLC chromatogram showing reversed-phase 
C18 fractionation of Heteroscodra maculata venom; peaks containing 
Hmla and Hm1b are labelled. Peptide sequences as determined by Edman 
degradation are displayed above. Middle, MALDI-TOF spectra of native 
undigested Hm1b (top) and native Hm1b digested with carboxypeptidase 
Y for 20 min (bottom), with inserted spectra showing the monoisotopic 
mass of each in daltons (Da). The observed mass difference of 146 Da 
between the intact and digested Hm1b corresponds to the final residue, 
Phe, with an amidated C terminus. Right, chromatograms show reversed- 
phase Cjs HPLC profiles of native and synthetic Hm1a, which were 
indistinguishable when co-injected. b, Representative currents from 
oocytes expressing hNa, subtypes before (black) and after (grey) bath 
application of ICA-121431 (500nM). Currents were elicited during 1-Hz 
stimulation to induce use-dependent block. c, Top, amino acid sequence 
comparison of Hm1a with SGTx1, a related, but non-selective fast- 
inactivation inhibitor. Bottom, representative calcium imaging experiment 
comparing ICA-121431-mediated block of Hmla- or SGTx1-evoked 
responses in cultured embryonic DRG neurons, with group data at right 
(**P < 0.01, ***P < 0.001, n=4). d, Top, fraction of PO mouse neurons 
responding to Hm1la versus SGTx1 (**P < 0.01). Bottom, ratiometric 
calcium responses elicited by SGTx1 (500 nM) in the presence and absence 


of ICA-121431 (500 nM). e, Dose-response curves for Hm1a inhibition of 
fast inactivation in oocytes expressing Nay1.1, Nay1.2 or Nay1.3. Sustained 
current at the end of a 100-ms pulse is normalized to peak current to 
quantify magnitude of the effect. ECs» values for hNay1.1=38nM, 
hNay1.2 = 236 nM and hNa,1.3 = 220 nM. f, Representative traces from 
oocytes expressing hNa, subtypes in response to a saturating dose 

(on hNayl.1) of purified Hm1b during a 100-ms depolarization. g, rK,2.1 
chimaeras containing different Nay1.9 S3b-S4 motifs were tested for 
sensitivity to hHm1a (100nM). Representative traces (top) and summary 
data (bottom) show a lack of toxin sensitivity for each chimaera. h, Top 
left, representative currents from oocytes expressing mK,4.1 before (black) 
and after (red) bath application of Hm1a (51M). Middle, quantification of 
mK,4.1 blockade by synthetic or native Hm1a. Top right, comparison of 
sustained current during application of native or synthetic Hm1a (11M) 
shows similar effects on Nayl.1 inactivation. Bottom, representative traces 
(left) showing that outward currents in PO trigeminal mouse neurons 

are unaffected by Hm1la (500 nM). Scatter plot (right, n = 10) shows 

no significant difference. i, Percentage of Hm1a (500 nM)-responsive 
neurons in various culture conditions as assessed by calcium imaging 

(n= 3-4,*P < 0.05). Error bars represent mean + s.e.m. P values based 

on two-way ANOVA with post hoc Tukey’s test (c) or unpaired two-tailed 
Student’s t-test (d, i). 
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Extended Data Figure 2 | See next page for caption. 
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Extended Data Figure 2 | Hm1a selectivity depends on DIV S1-S2 and 
$3b-S4 regions. a, Top, alignments between K,2.1 and hNayl.1 S3b-S4 
regions from each domain (as indicated) with sequence of chimaeras 
shown below each alignment. Bottom, G-V relationships from chimaeric 
channels expressed in oocytes in the absence (black) and presence 
(colours) of Hm1a (100 nM). b, Sequence alignment of hNayl.1 and 
rNa,1.4 showing putative transmembrane segments (green) and regions 
swapped in chimaeric channels (grey). c, Top, using the background of 
Nay1.4 chimaera containing the S3b-S4 and S5-S6 regions of Nay1.1, 
individual residues were mutated in the S1-S2 loop to the cognate residue 
in Nayl.1. The D1376T and Y1379S point mutants in the chimaeric 
rNa,1.4 channel reveal an increase in peak current after 100 nM Hmla 
application (red) relative to untreated controls (black). Filled circles denote 
G-V relationships, where oocytes were depolarized for 50 ms in 5-mV 
steps from a holding potential of —90 mV. Open circles denote steady- 
state inactivation (I/Imax) relationships, where oocytes were depolarized 
from —90 mV to +5 mV in 5-mV increments for 50 ms preceeding a 
50-ms step to —15 mV. Middle, dot plot detailing per cent increase in peak 
conductance of each point mutant in response to 100nM Hmz1a treatment. 
Each point represents a single oocyte; red bars indicate 95% confidence 
interval. Mutations highlighted in orange (D1376T and Y1379S) are 


statistically different from S3b-S4/S5-S6 control (*P < 0.01, Student’s 
t-test). The Q1372E mutant did not generate currents. Bottom, alignment 
of DIV S1-S4 regions from relevant mouse and human Nay isoforms. 
Orange highlights location of residues in the S$1-S2 loop that putatively 
contribute to the toxin effect. d, Left, stylized DIV with transmembrane 
segments represented as circles and extracellular loops as bars (black for 
native rNa,1.4 channel and green for portions transplanted from hNay1.1). 
Middle, traces displaying effect of Hm1a on each chimaera depolarized 
to —15mV from a holding potential of —90 mV. Right, conductance- 
voltage (G/G,,ax) and steady-state inactivation (I/Imax) relationships of 
each channel and chimaera before and after toxin (black and red, 
respectively) across a voltage range spanning —90 mV to OmV froma 
holding potential of —90 mV in 5-mV increments. Scale bars as in Fig. 2. 
e, Dot plots displaying the effect of 100 nM Hm1la on peak current 

(left) and persistent current (right). Data in the left plot were generated 
by dividing peak conductance after Hm1a application by the peak 
conductance before. Right plot shows persistent current divided by peak 
current before (black) or after (red) toxin addition. Persistent current 
was determined by averaging current from the final millisecond of 
depolarization to 0 mV from a holding potential of —90 mV. Vertical bars 
indicate 95% confidence interval. 
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Extended Data Figure 3 | Na,1.1 is expressed by myelinated, non- PO mouse trigeminal cultures as assessed by calcium imaging (leftmost 
C-fibre sensory neurons. a, Representative images showing expression column) and the percentage of toxin-sensitive cells that responded to other 
of various molecular markers and their overlap with Nay1.1 transcripts. agonists (1-(m-chlorophenyl)-biguanide (mCPBG), allyl isothiocyanate 
Markers include immunohistochemical staining for CGRP and tyrosine (AITC), capsaicin, and menthol activate 5-HT3, TRPA1, TRPV1 and 
hydroxylase (TH) and in situ histochemistry for TRPM8 and 5-HT; ion TRPMB8 channels, respectively), or bound the lectin IB4. c, Table including 
channel transcripts. Quantification of overlap for these markers is shown conduction velocity and Von Frey thresholds for skin-nerve experiments 
in Fig. 3. b, Quantification of the number of toxin-responsive cells in presented in Fig. 3d. Error bars represent mean + s.e.m. 


© 2016 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


Fraction co-labeled 


300 


=k 
fo) 
oO 


ge 
3 
fe) 
— 
& 
© 200 
jam 
[= 
ro) 
© 
= 
— 


0 
WT CKO 

Extended Data Figure 4 | Control experiments. These data show (SNJ) or intraplantar Hm1a injection. SNI induced ATF3 expression in 
control experiments related to Fig. 4. a, Representative DRG sections >50% of DRG neurons whereas ATF3 induction after Hm 1a injection 
from peripherin-Cre adult mouse showing neurons that express Cre was indistinguishable from vehicle (measured 1 or 3 days post-injection). 
recombinase as visualized using a floxed-stop YFP reporter mouse. d, Peripherin-Cre x floxed Na,1.1 mice were compared with wild-type 
In situ hybridization histochemistry shows overlap with Nayl.1 transcripts _ littermates in the rotarod test. No significant differences were observed 
(right). b, Quantification of overlap between YFP and Nay1.1. (unpaired Student's t-test). Error bars represent mean + s.e.m. 


c, Comparison of ATF3 induction in DRG following sciatic nerve ligation 
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Extended Data Figure 5 | A subset of colonic afferents does not express 
functional Na,1.1. a, Left, representative ex vivo colonic single fibre 
recording from an Hm1la (100 nM)-non-responsive high-threshold fibre 
from a healthy mouse (arrows indicate application and removal of 2 g Von 
Frey hair stimulus). Middle, group data showing lack of Hmla-mediated 
responses from a subset (9 out of 15) of fibres. Right, group data showing 
a population (5 out of 10) of healthy, high-threshold mechanoreceptor 
colonic afferents unaltered by ICA-121432 in the presence or absence 

of Hm1a (100 nM). b, Left, representative whole-cell current clamp 
recording of a retrogradely traced colonic DRG neuron in response to 
500-ms current injection at rheobase. Recordings were made from the 
same neuron of a healthy control mouse before and after incubation with 
Hmla (10nM). Horizontal scale bar, 250 ms; vertical scale bar, 20 mV. 


Middle and right, group data show no effect of Hm1a application on 
electrical excitability in a sub-population (6 out of 11) of colonic DRG 
neurons. ¢, Left, representative high-threshold mechanoreceptive colonic 
fibres from CVH mice showing no change after application of Hmla 

(100 nM). Middle, group data from Hm1a-non-responsive colonic fibres 

(4 out of 11). Right, group data showing a subpopulation of CVH colonic 
afferents (3 out of 10) unaltered by ICA-121432 in the presence or absence 
of Hmla. d, Left, representative Hm1la-non-responsive colonic DRG 
neuron in whole-cell current clamp. Middle and right, group data show 
electrical excitability is unaltered by Hm1a in a subset (4 out of 11) of CVH 
colonic DRG neurons. Error bars represent mean + s.e.m. No significant 
differences were observed using Student's t-test (a-d, middle; b, d, right) 
or one-way ANOVA with post hoc Bonferroni test (a, ¢, right). 
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Defining the consequences of genetic 
variation on a proteome-wide scale 


Joel M. Chick!*, Steven C. Munger, Petr Simecek?, Edward L. Huttlin!, Kwangbom Choi”, Daniel M. Gatti?, 
Narayanan Raghupathy’, Karen L. Svenson’, Gary A. Churchill’s & Steven P. Gygi!s 


Genetic variation modulates protein expression through both transcriptional and post-transcriptional mechanisms. 
To characterize the consequences of natural genetic diversity on the proteome, here we combine a multiplexed, mass 
spectrometry -based method for protein quantification with an emerging outbred mouse model containing extensive 
genetic variation from eight inbred founder strains. By measuring genome-wide transcript and protein expression in 
livers from 192 Diversity Outbred mice, we identify 2,866 protein quantitative trait loci (pQTL) with twice as many local 
as distant genetic variants. These data support distinct transcriptional and post-transcriptional models underlying the 
observed pQTL effects. Using a sensitive approach to mediation analysis, we often identified a second protein or transcript 
as the causal mediator of distant pQTL. Our analysis reveals an extensive network of direct protein-protein interactions. 
Finally, we show that local genotype can provide accurate predictions of protein abundance in an independent cohort 


of Collaborative Cross mice. 


Regulation of protein abundance is vital to cellular functions and 
environmental response. According to the central dogma’, the coding 
sequence of DNA is transcribed into mRNA (transcript), which in turn 
is translated into protein. Although rates of transcription, translation 
and degradation of both transcript and protein vary, under this sim- 
plest model of regulation, the cellular pool of a protein is determined 
by the abundance of its corresponding transcript. Genetic or envi- 
ronmental perturbations that alter transcription would directly affect 
protein abundance. In reality, many layers of regulation intervene in 
this process, and numerous studies have been carried out to deter- 
mine whether and to what extent transcript abundance is a predictor 
of protein abundance”. Several studies have reported that there is 
generally a low correlation between the two. An emerging consensus 
is that much of the protein constituent of the cell is buffered against 
transcriptional variation*’, but a global perspective of protein buffer- 
ing has not been put forward. 

Genetic variants can influence transcript and protein levels in a 
quantitative manner. Mapping quantitative trait loci (QTL) that affect 
transcript (eQTL) or protein (pQTL) abundance in model organisms 
or human cell lines can identify causal variants and provide a tool to 
dissect the mode of regulation of gene expression®. Analyses of eQTL 
have yielded a global but incomplete understanding of the regulatory 
mechanisms associated with gene expression” !3. Until now, pQTL 
analysis has been applied to a modest set of proteins through shotgun 
proteomics or targeted protein analysis*”!*". Much of the pioneering 
work behind pQTL analysis has been conducted in yeast crosses using 
mass spectrometry'*"!° or green fluorescent protein (GFP) fusions”. 
Recent advances in quantitative proteomics”!”’ present the possibility 
of near-comprehensive, genome-wide pQTL analysis. 

To investigate how genetic variation affects transcript and protein 
abundance globally requires a broad set of perturbations. The Diversity 
Outbred (DO) mouse model is a heterogeneous stock derived from the 
same eight founder strains as the Collaborative Cross (CC) mice”*-*° 
(Fig. 1a). The founder strains are fully sequenced”® and capture a con- 
siderable cross-section of the genetic variation present in laboratory 


and wild mouse populations. The balanced allele frequencies and 
simple population structure of the DO mice provides high power and 
precision for mapping QTL with relatively small sample sizes relative 
to human mapping studies. We designed a QTL mapping approach 
that takes advantage of these unique properties of the DO and our 
knowledge of the founder genomes”’. For each individual DO mouse, 
we imputed the founder strain ancestry at 64,000 evenly spaced loci 
across the diploid genome. 


Gene and protein expression profiling 

We first applied multiplexed proteomics to evaluate the extent of pro- 
tein abundance variation among the eight DO/CC founder strains. 
Founder strain liver proteins were analysed in duplicate from both 
sexes (Extended Data Fig. 1a, Supplementary Table 1). Protein abun- 
dance was highly variable across the eight founder strains; hierar- 
chical clustering and principal component analysis suggested that 
strain was the major factor driving variation followed by sex. This 
analysis confirmed our expectation that the wild-derived founder 
strains CAST/EiJ (CAST) and PWK/PhJ (PWK) were most distinct, 
underlying much of the genetic variability in protein expression 
(Extended Data Fig. 1b-d). 

We next profiled protein and transcript levels in liver tissue from 
192 DO mice, including both females and males, with half of the 
animals fed standard rodent chow and the other half fed a high-fat 
diet (Methods, Fig. 1b and Supplementary Tables 2 and 3). In total, 
we measured 6,756 proteins and 16,921 transcripts with detection in 
at least half of the samples. Both transcript and protein abundance 
were highly variable, and principal component analysis identified sex 
and diet as major drivers of this variation (Extended Data Fig. 2a). 
As expected, many proteins displayed sex- or diet-specific protein 
expression. Known female- and male-specific proteins were selec- 
tively expressed in a sex-dependent manner (Extended Data Fig. 2b, c). 
Likewise, many proteins showed diet-specific expression such as 
PPAR signalling, fat and cholesterol metabolism enzymes (Extended 
Data Fig. 2d, e), and many of these had concordant transcriptional 
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Figure 1 | Tandem mass tag (TMT)-based liver proteomics in 192 DO 
mice. a, Overview of the breeding scheme to create the DO and CC mouse 
strains. b, Experimental overview of the genotyping, transcriptomics and 
proteomic analysis on 192 DO mouse livers from both sexes on a high-fat 
or chow diet. 


responses (Extended Data Fig. 2f-j). These results demonstrate that 
sex and diet induced expected changes in transcript and protein 
expression. 


Genetic regulation of protein abundance 

In the subsequent analyses, we focused on 6,707 proteins for which 
both the protein and its corresponding transcript were detected in at 
least half of the DO liver samples. Genetic factors explained a substan- 
tial portion of variation in the abundance of protein and transcripts in 
the DO population (Extended Data Fig. 3a-f). To identify these, we 
performed QTL mapping analysis on transcript (eEQTL, Supplementary 
Table 4) and protein (pQTL, Supplementary Table 5) abundance. 

We identified 2,866 pQTL for 2,552 distinct proteins at a genome- 
wide significance level of P< 0.1 (Fig. 2a). This is the largest set of 
pQTL identified so far, with tenfold greater numbers than other mass 
spectrometry (MS)-based approaches. Significant local pQTL were 
more common than distant pQTL (1,736 local and 1,130 distant pQTL) 
(Extended Data Fig. 3g). In addition, we identified 4,188 significant 
eQTL among 3,706 genes, with threefold more local than distant associ- 
ations at the transcript level (3,211 local and 977 distant eQTL; Fig. 2a, 
Extended Data Fig. 3h, i). Finally, to examine the replication rate, we 
analysed a replication set of 192 separate DO mice treated under iden- 
tical conditions for eQTL (see Methods and Extended Data Fig. 4). 

To determine whether the same genetic loci acted on transcript and 
protein abundance, we first compared the QTL maps. We observed a 
significant overlap of proteins with pQTL and eQTL (n= 1,400; hyper- 
geometric P< 1 x 107°; Fig. 2a). As expected, genes with concordant 
QTL had generally higher correlations between protein and transcript 
abundance compared to those having only pQTL, only eQTL or 
neither (Fig. 2b). Among local QTL only, we observed a high degree of 
overlap with 80% of local pQTL having a corresponding local eQTL. 


ARTICLE 


20 Distant QTL 


a c 
Total QTL 40 


pQTL ea 
1,152 “400 


Local QTL 


Delta LOD 


Local 


0 10 20 30 40 50 O 5 10 15 20 2 
pQTL LOD pQTL LOD 


Local regulation 
Transcription-mediated 


Distant regulation 
Transcription-mediated 


O>-@-@ 1» Q-@-@ 


Post-transcriptional Post-transcriptional 


05 00 O05 1.0 
RNA-protein correlation 
Figure 2 | Global view of the liver proteome reveals distinct genetic 
models of protein regulation. a, Venn diagram showing the distribution 
of transcripts and proteins broken down into local or distant QTL. 
b, Histograms of Pearson correlations for each gene's protein and 
transcript measurements after segregating into four groups 
(eQTL-pQTL (purple), pQTL-no eQTL (blue), eQTL-no pQTL (green) 
and no QTL (grey)). ¢, Local and distant pQTL LOD scores after transcript 
measurements were used as a covariate in the regression model showing 
that local pQTL were mediated through their cognate transcripts unlike 
distant pQTL. d, Model selection by Bayesian information criterion (BIC). 
Local pQTL (QTL) were mostly transcriptionally controlled, whereas 
distant pQTL (QTLp) were regulated generally by post-transcriptional 
mechanisms. 


The small number of local pQTL that lack corresponding eQTL 
(n= 344) could result from genetic variation that regulated protein 
abundance via post-transcriptional mechanisms such as coding var- 
iation that affected protein stability without altering transcript levels. 
In contrast, distant genetic variants that affected both transcript and 
protein levels seem to be nearly mutually exclusive (Fig. 2a). This 
observation leads to the intriguing hypothesis that most distant pQTL 
affected the abundance of a target protein via post-transcriptional 
mechanism(s). 

For each of the 6,707 expressed proteins, we chose the most signif- 
icant local and distant QTL, regardless of whether the log odds ratio 
(LOD) scores at each locus exceeded the pQTL detection threshold. 
We regressed out the transcript abundance and examined the effect 
on the peak LOD scores (Fig. 2c). Proteins with pQTL that are medi- 
ated through their corresponding transcript should show a reduced 
LOD score when transcript abundance is included in the regression 
model. Most local pQTL had significantly lower LOD scores after 
conditioning on their corresponding transcript (1,136 out of 1,736 
dropped by >20%), while most distant pQTL were unaffected after 
conditioning on their transcript (164 out of 1,007 dropped by >20%). 
This suggests that local pQTL were largely mediated through tran- 
scriptional mechanisms, whereas distant pQTL were more likely to 
regulate protein abundance without affecting transcript abundance. 

We carried out a model selection analysis using Bayesian informa- 
tion criterion (BIC) to identify the most probable path relating a locus 
genotype to a protein and its corresponding transcript. We evaluated 
all 6,707 proteins using the best local and distant markers identified in 
the pQTL mapping, and recorded the path that best explained 
the observed expression data (Fig. 2d, Extended Data Fig. 5 and 
Supplementary Tables 6 and 7). We illustrate these models in Fig. 2d 
in a more simplified form and present a more complete version of 
these models in Extended Data Fig. 5. Three of the models had no 
path connecting the locus to protein abundance. For most proteins, 
these were the best-fitting models for the local QTL (n= 4,505) and for 
the distant QTL (n =5,944). The remaining models linked the abun- 
dance of a protein to either a local QTL or a distant QTL. Among 
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local QTL, we found that most had effects that were mediated at least 
partially through the transcript (n = 1,579), while a minority affected 
protein abundance independently of the transcript (n = 623). Among 
distant QTL, a much smaller proportion acted through the transcript 
(n=17), and most affected protein abundance independently of the 
transcript (n = 746). We conclude that most local pQTL affected 
both protein and transcript abundance, consistent with a transcrip- 
tional mode of regulation. However, distant pQTL affected protein 
abundance independently of the transcript, consistent with a post- 
transcriptional mode of regulation. 


Local pQTL effect on protein abundance 

We highlight two examples that illustrate the most common models 
of local regulation. DHTKD1 exemplifies a pQTL in which a local 
genetic variant affected transcript abundance that was transmitted 
to the protein (Fig. 3a, b). This simple transcript-to-protein model 
of regulation was evidenced by the high correlation between tran- 
script and protein abundance (Fig. 3b, inset) and loss of the pQTL 
when transcript abundance was added as a covariate in the regression 
model (Supplementary Table 8). Founder strain allelic contributions 
derived from the pQTL mapping model suggested that four founder 
strain alleles (129S1, CAST, PWK and WSB) shared the genetic 
variant and exhibited higher protein expression levels. To validate 
these findings, a comparison of these expression coefficients to 
founder strain data showed the same expression profiles (Fig. 3c). 
Using genome sequences of the founder strains”®, we identified a 
candidate causal genetic variant—a 1-kb deletion in intron 1 of the 
gene. The same variant was previously reported as a pQTL in the DBA 
mouse strain!’, DHTKD1 was just one of almost 1,600 cases in which 


502 | NATURE | VOL 534 | 23 JUNE 2016 


QTL-to-transcript-to-protein regulation was identified as the best 
local model. Additional examples include Ces2h and Pipox (Extended 
Data Fig. 6). 

A total of 623 proteins had local pQTL that affected protein abun- 
dance directly, including OMA1 (Fig. 3d-f). These proteins were 
uncoupled from their transcript, as evidenced by the lack of corre- 
lation between protein and transcript abundance (Fig. 3e, inset). For 
Oma1, founder allele contributions in the DO population pointed to 
an allele from the CAST strain causing reduced protein levels. This 
was validated by protein expression in the founder strains (Fig. 3f). 
Genome analysis identified four missense mutations in Oma1 (H73N, 
R97Q, 1127K and V283L), suggesting that protein structure may be 
affected and not the transcript. Other examples of variants that influ- 
enced protein expression that were not mediated through transcripts 
include Entpd5 and Lars2 (Extended Data Fig. 6). 


Causal intermediates of distant pQTL 

Unlike local pQTL, in which the causative variant is directly linked 
to the target protein-coding gene, distant pQTL exert their effects on 
target proteins in trans through a causal intermediate. To determine 
whether a distant pQTL acts proximally through the transcript of 
the affected protein or directly on the protein bypassing the transcript, 
we used mediation analysis (see Methods). We examined 1,130 distant 
pQTL and identified at least one candidate protein or transcript mediator 
for 743 (Supplementary Table 8). In total, we found 618 unique 
protein/transcript mediators, of which 534 regulated a single protein, 
61 regulated two proteins, and 23 regulated three or more proteins. 
Furthermore, 84% of the top candidate protein mediators were 
themselves driven by a local pQTL. This illustrates that a single local QTL, 
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Figure 4 | Mediation of distant pQTL reveals 
network interactions in the liver proteome. 
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acting proximally on a transcript or protein intermediate, can effectively 
control the abundance of a distant protein or multiple distant proteins, 
uncoupling them from their transcriptional control mechanisms. 

We highlight examples in which mediation analysis identified 
the regulatory protein or transcript underlying the distant pQTL. 
TMEM68 protein exemplified a post-transcriptional model of regu- 
lation (Fig. 4a). TMEM68 has a distant pQTL peak on chromosome 13, 
and the Tmem68 transcript has a local QTL on chromosome 4 (Fig. 4b, 
Supplementary Table 4). The protein and transcript levels were uncor- 
related (Fig. 4d, left). We identified both NNT protein and Nnt tran- 
script on chromosome 13 as candidate mediators of the distant pQTL 
for TMEM68 (Fig. 4c). The Nut protein and transcript shared a local 
QTL indicating a transcriptional mechanism. Both Nnt protein and 
transcript were highly correlated with TMEM68 abundance (Fig. 4d). 
Founder allele expression patterns inferred at the distant pQTL 
suggest that a variant in B6 mice causes a downregulation in NNT pro- 
tein levels, which was validated by proteomic analysis of the founder 


strains (Fig. 4e). This effect on Nnt expression has been previously 
attributed to a small exonic deletion found only in the B6 strain”*-”, 
Using this same approach, we reconfirmed numerous known protein- 
protein associations including SNX7-SNX4, PGAM1-PGAM2 and 
LRRFIP1-FLII (refs 31-33), and inferred many new associations 
(Extended Data Fig. 7). 

The chaperonin containing TCP1 (CCT) complex illustrates how 
mediation analysis can reveal larger co-regulated complexes and path- 
ways (Fig. 4f). All eight subunits of the CCT complex shared a distant 
pQTL (but not distant eQTL) on chromosome 5 with the same pattern 
of allele effects. We identified the transcript and protein abundance of 
Cct6a as mediators of this post-transcriptional distant effect (Fig. 4g, h). 
This relationship is evident by the high correlation in protein- 
protein and protein-transcript abundance between Cct6a and other 
complex members (Fig. 4i, Extended Data Fig. 8). Founder strain 
allele effects inferred at the distant pQTL showed that DO animals 
containing the NOD strain allele on chromosome 5 expressed lower 
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Figure 5 | Genotype can be an accurate predictor of protein abundance. 
a, Founder strain protein abundance values inferred at significant pQTL 
in the DO population closely match measured abundance values from 
the founder strains themselves. The distributions of Pearson correlations 
are plotted for local pQTL and distant pQTL. Local pQTL are generally 
more predictive of abundance values in the founder strains (local median 
r= 0.72, distant median r=0.11). b, Founder strain allele predictions 
from the DO were also assessed against protein abundance data collected 
from four CC strains (n = 6 mice per strain). We observe that local pQTL 
are more predictive of protein abundance in the CC strains (local median 
r= 0.63; distant median r= 0.22). c, Predictive power depends largely 

on the significance of the pQTL. Local pQTL generally had higher LOD 
scores, and as such we had higher power to predict these proteins (n = 4 
mice for each founder, 2 male and 2 female, black bars represent median 
values). An example is shown for LYPLAL1. d, Protein abundance could 
also be predicted for genes with significant distant pQTL in the DO 
population; however, as a group these predictions were modest compared 
to local pQTL. As an example, NAGS abundance in the CC strains could 
be predicted based on the local genotype at its mediator protein, GLYCTK 
(n=6 mice for each CC strain, 3 male and 3 female, black bars represent 
median values). 


overall levels of the entire complex. This same pattern was observed 
in the founder strains (Fig. 4j). Genome sequence analysis identified 
a variant (1s228180583) in a conserved KLF4-binding domain in the 
Cct6a promoter region that was present only in the NOD strain. From 
these data, we propose that the variant lowers Cct6a transcript and 
protein abundance, which results in a stoichiometric imbalance and 
degradation of excess unbound complex members. These examples 
highlight the power of mediation analysis to identify protein-protein 
associations and co-regulated groups of proteins. 


Genetic perturbations reveal protein networks 

By leveraging the large number of distant pQTL and mediation analysis 
of each, we created a network of pQTL-regulated proteins (Extended 
Data Fig. 9a). Each distant pQTL is causally linked to its target protein 
with mediators and other co-regulated proteins to form a network. 
When merged across all 1,130 distant pQTL, the network comprises 
5,794 causal or co-regulatory relationships among 3,938 proteins or 
QTL. Markov cluster algorithm (MCL) clustering defined 671 clusters 
of variable sizes (Extended Data Fig. 9b). Approximately 44% of clus- 
ters included members with shared biological functions as assessed 
by Gene Ontology (GO) enrichment (Extended Data Fig. 9c). As an 
example, almost all cholesterol synthesis enzymes were determined to 
be co-regulated and associated with just two distant pQTL that affected 
the protein expression for Lss and Cyp51 (Supplementary Table 8). 
Clusters found within the larger regulatory network tended to associate 
proteins with shared biological properties. Some clusters grouped 
proteins according to subcellular localization, as seen for complex I 
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of the electron transport chain (Extended Data Fig. 9d), SUCLG1/ 
SUCLG2 and associated mitochondrial proteins (Extended Data 
Fig. 9e), and IMMT/SAMM50 with other mitochondrial proteins 
(Extended Data Fig. 9f). Each corresponds to a well-studied complex, 
suggesting that the regulatory network emerging from mediation anal- 
ysis provides an accurate snapshot of mouse liver gene regulation. 

To probe further the correspondence between protein co-regulation 
and physical association, each pQTL and its co-regulated proteins were 
mapped onto an ongoing and recently published human interactome 
network*4, Physical associations accounted for a significant subset of 
protein regulatory networks, especially among distant QTL (Extended 
Data Fig. 9g-l). Through these findings, we propose that a considerable 
fraction of distant pQTL were the direct result of post-transcriptional 
regulation of proteins that had similar biological functions, cell 
locations, and/or complex membership. 


Genotype is a predictor of protein abundance 

For many genes with pQTL, founder strain allele patterns inferred 
from the DO pQTL mapping model closely matched protein abun- 
dance measured in the founder strains themselves. To determine the 
extent to which genotype can be a predictor of protein abundance, 
we examined all significant pQTL and compared the founder strain 
coefficients observed at the pQTL location to the protein levels meas- 
ured in the founder strains (Fig. 5a). We found that predictive power 
increased with the significance of the pQTL (Fig. 5a, Extended Data 
Fig. 10). Because of their tight linkage to the controlled gene, local 
pQTL tended to have higher predictive power than distant loci (local 
pQTL median r =0.72; distant pQTL median r=0.11). However, 
highly significant distant pQTL (>10 LOD) have comparable predic- 
tive power to local pQTL of similar significance. 

We further validated our strains predictions using the quantitation 
of ~6,500 proteins from four CC strains (Supplementary Table 9). 
For each pQTL, we identified the genotype in the CC strains and pre- 
dicted the protein abundance using the DO proteomics data. Our data 
suggest that strain genotype is also predictive of protein abundance 
in the CC strains (Fig. 5b). The predictive power was higher for local 
pQTL than distant ones. As an example, LYPLAL1 was identified with 
a local pQTL in the DO population and was predicted to have lower 
protein abundance in the CC001 and CC003 strains (Fig. 5c). For 
distant pQTL with high LOD scores, the predictive power was also 
high. For distant pQTL, these predictions were made by comparing the 
measured protein and the genotype at the QTL location. For example, 
GLYCTK protein abundance was predicted using the genotype at the 
Nags gene location where the variant was detected (Fig. 5d). 

This study quantified both protein and transcript abundance in a 
genetically diverse population of mice, mapping their genetic archi- 
tecture. We identified the largest catalogue of pQTL so far, which can 
be attributed to two variables in our experimental design. First, we 
have improved the accuracy and sensitivity of quantification for both 
protein and transcript abundance. Second, our experimental popu- 
lation captured genetic diversity far in excess of the human popula- 
tion and standard laboratory mouse strains. Earlier studies reported 
a disconnect between transcript and protein abundance”*®, which 
has also been a conclusion drawn from several recent eEQTL-pQTL 
analyses*”!”°, Data here show that local QTL tend to abide by the cen- 
tral dogma as demonstrated by concordant effects on transcripts and 
proteins, whereas distant pQTL are conferred by post-transcriptional 
mechanisms. Our mediation analysis provided the ability to identify 
causal protein intermediates underlying distant pQTL and led to the 
identification of hundreds of protein-protein associations. Our exper- 
imental design provides an advantage over protein interaction maps 
because genetic mapping is not dependent on physical interactions. 
This conclusion is further exemplified by the co-regulation of protein 
complexes or biochemical pathways in this study. Stoichiometric buff- 
ering provides one explanation for co-regulation of protein complexes 
and may account for earlier observations that protein abundances 
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(but not transcript abundances) of orthologues are well-conserved 
across large evolutionary distances***”. 

These findings suggest a new predictive genomics framework in 
which quantitative proteomics and transcriptomics are combined in the 
analysis of a discovery population like the DO to identify genetic inter- 
actions. Next, pathways relevant to the tissue/physiological phenotype 
of interest are intersected with the list of significant pQTL. Pathways 
enriched for proteins with significant pQTL should be amenable to 
manipulation in the founder and CC strains. That is, the founder 
allele effects inferred at the pQTL can be combined in such a way via 
crosses of CC strains to tune pathway output. Moreover, as we better 
understand the types of mutation that can affect protein abundance, 
we can introduce specific mutations with gene editing into sensitized 
or robust genetic backgrounds. We foresee this strategy being used to 
design reproducible rodent models that span a range of human-relevant 
phenotypes, for example, in drug metabolism or toxicology studies. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


The sample size (192 animals) was calculated based on previous experimental 
RNA-seq data and was determined to be sufficient to detect genetics effects that 
explain 10% or more genetic variation with 90% power and 10° type I error rate. 
Randomization was used to assign mice to treatments and samples to batches, bar 
codes, and TMT tags in both the RNA-seq and proteomics experiments. Data col- 
lection was carried out by automation, and as such there was no need for blinding 
the sample identifiers. 

Animals and genotyping: DO mice. Diversity Outbred mice (DO, stock no. 
009376) were obtained from The Jackson Laboratory (JAX) at 3 weeks of age, 
housed at JAX, and fed either standard rodent chow (6% fat by weight, LabDiet 
5K52; LabDiet, Scott Distributing) or a high-fat diet (44.6% kcal fat and 34% kcal 
sucrose by weight, TD.08811, Harlan Laboratories) from wean age throughout 
the study. In total, 192 DO mice were analysed in the current study, including 
50 females and 48 males raised on standard chow, and 48 females and 46 males 
raised on the high fat diet. At 26 weeks of age, animals were euthanized, dissected, 
and liver samples were sent for RNA-seq analysis at JAX (samples stored in 
RNAlater solution; Life Technologies) and proteomics analysis at Harvard Medical 
School (HMS; samples sent as snap frozen tissue). 

Animals and genotyping: founder strain and CC mice. Two male and two female 
mice from each of the eight DO/CC founder inbred strains and four (3 males and 
3 females) CC recombinant inbred strains (CC strains CC001, CC003, CC004,and 
CC017) were obtained from and housed at JAX, raised on the standard chow diet. 
Founder strain mice were euthanized at 26 weeks of age, and the CC mice were 
euthanized at 8-16 weeks of age. Liver samples were dissected from each mouse, 
snap frozen and sent to HMS for proteomics analysis. All procedures on mice were 
approved by the Animal Care and Use Committee at JAX. 

Multiplexed quantitative proteomic analysis of mouse livers: sample prepara- 
tion and TMT labelling. A total of 192 DO mouse livers (~50 mg), 32 founder 
strains livers (8 founders strains, 2 male and 2 female replicates for each strain) 
and 24 CC strain livers (4 strains, 3 male and 3 female replicates for each strain) 
were homogenized in 1 ml lysis buffer (1% SDS, 50 mM Tris, pH 8.8 and Roche 
complete protease inhibitors). Samples were reduced with 5 mM dithiothreitol for 
30 min at 37 °C followed by alkylation with 15 mM for 30 min at room temperature 
in the dark. The alkylation reaction was quenched by adding 5 mM dithiothreitol 
for 15min at room temperature in the dark. A 5001] aliquot was then methanol/ 
chloroform precipitated. The samples were allowed to air dry before being resus- 
pended in 1 ml of 8M urea and 50 mM Tris, pH 8.8. The urea concentration was 
diluted down to ~1.5M urea with 50 mM Tris. Proteins were quantified using a 
BCA assay. Protein was then digested using a combination of Lys-C/trypsin at 
an enzyme-to-protein ratio of 1:100. First, protein was digested overnight with 
Lys-C followed by 6-h digestion with trypsin all at 37°C. Samples were then acid- 
ified using formic acid to approximately pH 3. Samples were then desalted using 
a SepPak column. Eluents were then dried using a vacuum centrifuge. Peptide 
pellets were resuspended in 110 11 of 200 mM HEPES buffer, pH 8, and peptides 
were quantified by a BCA assay. Approximately 701g of peptides (100 1l of 
sample + 30,11 of 100% acetonitrile) were then labelled with 1511 of 20pgyl | of 
the corresponding TMT 10-plex reagent (DO or founder strains) or TMT 8-plex 
reagent (CC strains) for 2h at room temperature. The reaction was quenched using 
8 jl of 5% hydroxylamine for 15 min. Peptides were then acidified using 150 1l of 
1% formic acid, each set of 10 samples were mixed and desalted using a SepPak 
column. In total, 25 TMT 10-plex reactions and 3 8-plex reactions were performed 
(21 DO mice, 4 founder strains and 3 CC strains). The full labelling schemes for 
the DO mice, the founder strains and CC strains are provided as supplementary 
tables (Supplementary Tables 1, 3 and 7). 

Basic reverse-phase fractionation. Each of the 28 TMT experiments was sep- 
arated by basic, reversed-phase chromatography. Samples were loaded onto 
an Agilent 300 Extend C18 column (5 1m particles, 4.6mm ID and 220mm in 
length). Using an Agilent 1100 quaternary pump equipped with a degasser and 
a photodiode array detector (set at 220- and 280-nm wavelength), peptides were 
separated using a 50 min linear gradient from 18% to 40% acetonitrile in 10mM 
ammonium bicarbonate, pH 8, at a flow rate of 0.8 ml min~!. Peptides were 
separated into a total of 96 fractions that were consolidated into 24. Samples 
were subsequently acidified with 1% formic acid and vacuum centrifuged to near 
dryness. Each fraction was desalted via StageTip, dried via vacuum centrifugation, 
and reconstituted in 1% formic acid for liquid chromatography tandem mass 
spectrometry (LC-MS/MS) processing. 

Liquid chromatography electrospray ionization tandem mass spectrometry 
(LC-ESI-MS/MS). Peptides from every odd fraction (12 fractions total) from 
basic reverse-phase fractionation were analysed using an Orbitrap Fusion Tribrid 
mass spectrometer (Thermo Scientific) equipped with a Proxeon ultra high pres- 
sure liquid chromatography unit. Peptide mixtures were separated on a 100,1m 


ID microcapillary column packed first with ~0.5 cm of 51m Magic C18 resin 
followed by 40 cm of 1.8}1m GP-C18 resin. Peptides were separated using a 3-h 
gradient of 6-30% acetonitrile gradient in 0.125% formic acid with a flow rate of 
~400nl min}. In each data collection cycle, one full MS scan (400-1,400 m/z) 
was acquired in the Orbitrap (1.2 x 10° resolution setting and an automatic gain 
control (AGC) setting of 2 x 10°). The subsequent MS2-MS3 analysis was con- 
ducted with a top 10 setting or a top speed approach using a 2-s duration. The most 
abundant ions were selected for fragmentation by collision induced dissociation 
(CID). CID was performed with a collision energy of 35%, an AGC setting of 
4 x 10°, an isolation window of 0.5 Da, a maximum ion accumulation time of 
150 ms and the rapid ion trap setting. Previously analysed precursor ions were 
dynamically excluded for 40s. 

During the MS3 analyses for TMT quantification, precursors were isolated using 

a 2.5-Da m/z window and fragmented by 35% CID in the ion trap. Multiple frag- 
ment ions (SPS ions) were co-selected and further fragmented by HCD. Precursor 
ion selection was based on the previous MS2 scan and the MS2-MS3 was conduct- 
ing using sequential precursor selection (SPS) methodology. HCD used for the 
MS3 was performed using 55% collision energy and reporter ions were detected 
using the Orbitrap with a resolution setting of 60,000, an AGC setting of 50,000 
and a maximum ion accumulation time of 150 ms. 
Database searching and reporter ion quantification. Software tools were 
used to convert mass spectrometric data from raw file to the mzxml format**. 
Erroneous charge state and monoisotopic m/z values were corrected as per previous 
publication*, MS/MS spectra assignments were made with the Sequest algorithm"! 
using an indexed Ensembl database (mouse: Mus_musculus NCBIM37.61). 
Databases were prepared with forward and reversed sequences concatenated 
according to the target-decoy strategy”. All searches were performed using a static 
modification for cysteine alkylation (57.0215 Da) and TMT on the peptide N ter- 
mini and lysines. Methionine oxidation (15.9949 Da) was considered a dynamic 
modification. Mass spectra were searched with trypsin specificity using a precursor 
ion tolerance of 10 p.p.m. and a fragment ion tolerance of 0.8 Da. Sequest matches 
were filtered by linear discriminant analysis as described previously, first to a data 
set level error of 1% at the peptide level based on matches to reversed sequences”. 
Peptide probabilities were then multiplied to create protein rankings and the data 
set was again filtered to a final data set level error of 1% false discovery rate (FDR) 
at the protein level. The final peptide-level FDR fell well below 1% (~0.2% peptide 
level). Peptides were then assigned to protein matches using a reductionist model, 
where all peptides were explained using the least number of proteins. 

Peptide quantitation using TMT reporter ions was accomplished as previ- 
ously published”!”. In brief, a 0.003 Da m/z window centred on the theoretical 
m/z value of each reporter ion was monitored for each of the 8-10 reporter ions, 
and the intensity of the signal closest to the theoretical m/z value was recorded. 
TMT signals were also corrected for isotope impurities based on the manufac- 
turer’s instructions. Peptides were only considered quantifiable if the total signal- 
to-noise for all channels was >200 and an isolation specificity of >0.75. Within each 
TMT experiment, peptide quantitation was normalized by summing the values 
across each channel and then each channel was corrected so that each channel had 
the same summed value. Protein quantitation was performed by summing the 
signal-to-noise for all peptides for a given protein. Protein quantitative measure- 
ments were then scaled to 100 (equal expression across all channels would be 
a value of 10). Normalization across each of the 10plex experiments was then 
performed using quantile normalization. 

Statistical analyses. Principal component analysis was performed using Cluster 3.0 
(ref. 43). Hierarchical clustering, K-means clustering and ANOVA were performed 
using Multiexperiment Viewer. Analysis on the founder strains proteomics data 
sets was performed using an ANOVA and adjusted for multiple testing using the 
Benjamini-Hochberg FDR procedure. 

Implications of multiplexed quantitative proteomics platform. Improvements 
in several aspects of the analysis pipeline enabled the increase in scale. Our quan- 
titative proteomics technology proved instrumental as it supported multiplexing 
with ten different mouse livers in the same analysis. Accurate expression meas- 
urements were obtained by applying a notched isolation waveform on an Orbitrap 
Fusion instrument. The time required to collect expression profiles from each 
10-plex was 36h or ~4h per mouse liver of mass spectrometry analysis time. 
The proteome-wide analysis of 192 livers thus required 35 days. Asa result of 
these methodology improvements, we detected tenfold more pQTL than previous 
MS-based reports. 

Genotyping of DO and CC samples: DO samples. Genomic DNA was extracted 
from each DO mouse (n= 192 total samples) and genotyped at 57,973 single 
nucleotide polymorphisms (SNPs) on the Mega-MUGA platform (Geneseek). 
A total of 177 out of 192 samples passed SNP quality control metrics. For these 
samples, founder haplotypes were inferred from SNP probe intensities using a 
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hidden Markov model implemented in the DOQTL R package”, and then used 
to interpolate a grid of 64,000 evenly-spaced genetic intervals. In addition, founder 
haplotypes were independently inferred from the RNA-seq data by genotyping by 
RNA-seq (GBRS) protocol (see next section) and interpolated to the same 64,000 
interval grid. 

For each sample, we verified that the haplotype reconstructions agreed between 
the DNA Mega-MUGA and GBRS reconstructions by calculating the Pearson 
correlation between each pair of samples. When a Mega-MUGA sample had a 
correlation below 0.4 with the same sample ID in the RNA-seq data, we assumed 
that this sample was mismatched. We searched the RNA-seq data for the correct 
match to the Mega-MUGA sample by looking for another sample that was more 
highly correlated. If we found an RNA-seq sample with a correlation >0.4 that 
was not assigned to another sample, we matched it with the Mega-MUGA sample. 
When a sample was removed from the Mega-MUGA data for technical reasons, 
we used the GBRS haplotype reconstructions (samples F326, F328, F362, F363, 
F368, M377, M388, M392, M393, M394, M404, M408, M411, M419 and M425). 
Genotyping of DO and CC samples: CC samples. Founder haplotypes for the CC 
strains were downloaded from the CC strain database (csbio.unc.edu/CCstatus/ 
gstemp/AllImageHapAndGenotypeFiles.zip) maintained at the University of North 
Carolina. 

Transcriptome profiling and GBRS. Total liver RNA was isolated from each of 
the 192 DO mice and sequenced by single-end RNA-seq as previously described”. 
We aligned raw reads against pooled transcriptomes of the eight founder strains. 
To construct the pooled transcriptome, we incorporated founder strain-specific 
SNPs and insertions/deletions (Sanger REL-1410) into the reference strain genome 
sequence (GRCm38/mm10) to produce strain-specific genomes. We derived tran- 
script sequences for all annotated genes (Ensembl version 75 gene annotation) 
from each strain genome, and then combined the eight founder allele sequences 
for each transcript into one pooled transcriptome for read alignment. After align- 
ment, we quantified expected read counts expressed from each transcript allele 
using an expectation maximization algorithm (EMASE, https://github.com/ 
churchill-lab/emase). We repeated the same process for liver RNA-seq data from 
the eight founder strains to assess how specifically each founder read aligns back 
to their origin strain when exposed to all other founder alleles simultaneously in 
the alignment pool. We then evaluated the genotype probability of each transcript 
using a hidden Markov Model (HMM), where we bring those read counts together 
and calculate (1) how likely allele-specific read counts are generated from a specific 
genotype, and (2) how much those likelihoods comply within the context of neigh- 
bouring transcripts. Finally, we re-quantified total and allele-specific expression 
with EMASE by repeating the similar process but using individualized diploid 
transcriptomes reconstructed along our genotype calls. 

QTL mapping of transcript and protein abundance. Quantitative proteomics 
combined with transcript quantitation by RNA-seq makes it possible to define the 
relative contributions of transcriptional versus post-transcriptional mechanisms 
and local versus distant effects on protein abundance. For example, a local QTL 
is a genetic variant near the target gene that influences its expression; it might be 
expected to act in cis and affect both transcript and protein levels. By contrast, 
distant QTL exert their effect on a target gene’s expression in trans, most likely via 
a causal intermediate such as another protein or RNA species. Identifying causal 
intermediates of distant QTL effects may reveal novel protein-protein associations 
and their biological consequences. Our comprehensive pQTL analysis yielded a 
global network of interactions that shed new light on the regulation of protein 
abundance. 

QTL mapping. For mapping of pQTL and eQTL, we included only proteins that 
were present (non-0) in >96 samples and corresponded to gene identifiers in the 
RNA-seq data that were also expressed in >96 samples. A total of 6,707 proteins 
met these criteria. For pPQTL mapping with the proteomics data, protein abun- 
dance values were first quantile-normalized and transformed to rank normal 
scores, and then pQTL were mapped with the R package DOQTL”, using a linear 
mixed model with sex, diet and TMT tag as additive covariates and a random 
polygenic term to account for genetic relatedness among the DO animals*’”. For 
eQTL mapping from the RNA-seq data, gene-level counts were first normalized to 
the upper quartile value and transformed to rank normal scores, and then eQTL 
were mapped with DOQTL including sex, diet and batch as additive covariates and 
a random polygenic term to account for relatedness. We used the 64 k genotype 
matrix derived from Mega-MUGA DNA genotypes as input for pQTL and eQTL 
mapping, with the exception of samples with missing or low quality DNA genotype 
results where we used GBRS-derived genotypes. 

Statistical analyses. Significance thresholds were established by performing 10,000 
permutations and fitting an extreme value distribution to the maximum LOD 
scores**, Permutation derived P values were then converted to q-values with the 
QVALUE R package, using the bootstrap method to estimate 1 and the default A 
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tuning parameters”. The significance threshold for declaring a QTL was set at a 
genome-wide significance level of P< 0.1 (FDR= 10%). 

eQTL replication analysis. To detect a pQTL and eQTL requires a strong statistical 
signal to exceed stringent genome-wide significance thresholds. We considered 
the possibility that lack of concordance between distant pQTL and eQTL could 
be explained by low power, especially for the distant pQTL. The proteomics data 
in this study were obtained on a subset (discovery set) of DO mice from an earlier 
study**. We created a replication set for the eQTL by random sampling of 192 addi- 
tional DO samples. As expected, the likelihood of replicating an eQTL depended 
on the significance of the QTL in the discovery set (Supplementary Fig. 4a). Local 
eQTL tend to be more significant and replicated well across experiments (76% rep- 
lication, n = 2,448), while distant eQTL replicated poorly (5% replication, n = 52; 
Supplementary Fig. 4a). The distribution of LOD scores is similar for distant pQTL 
and distant eQTL (Supplementary Fig. 4b), suggesting that we had similar low 
power to detect distant pQTL as distant eQTL. While the overlap between distant 
pQTL and eQTL is lower than what we had expected (<1%, n= 9; Supplementary 
Fig. 4c), it is still difficult to rule out low rate of detection as a possible explanation. 
We provide additional evidence that distant pQTL act through post-transcriptional 
mechanisms. 

Model selection by BIC. For each of the 6,707 proteins in the discovery set with 
detectable transcript and protein abundance, we identified (1) the locus within 
+10 Mb of the gene midpoint with the highest LOD score (local), and (2) the 
locus on a separate chromosome with the highest LOD score (distant), regard- 
less of their statistical significance. Next, for each local and distant locus, we 
considered all possible relationships among locus genotype, transcript abundance, 
and protein abundance. We computed the BIC score for each of eight possible 
models. For each protein, we recorded the optimal local and distant locus model 
(that is, model that yields the lowest BIC score). In addition, we calculated the 
Bayesian posterior probability (assuming a uniform prior over relationships), 
and from these posterior probabilities estimated the expected number of proteins 
for each model. 

Mediation analysis to identify distant regulators and co-regulated proteins. For 
proteins with distant pQTL, mediation analysis was used to identify proteins and 
transcripts in that region that were likely to be the causal mediator of the QTL. 
Mediation analysis in this context is adapted from the general approach outlined 
previously™ to differentiate moderator from mediator variables in social psychol- 
ogy research*!. We implemented our method as the function ‘intermediate’ for the 
open statistical language R. In brief, for a given distant pQTL, we first identified 
all expressed proteins and transcripts within 10 Mb of the peak SNP—these genes 
are candidate mediators of the distant pQTL. We then included the protein abun- 
dance of each candidate individually as an additive covariate in the pQTL map- 
ping model and re-ran the regression at the peak distant SNP. We performed the 
same analysis with transcript abundance as the additive covariate. Our expectation 
was that many distant pQTL would be mediated by the protein and/or transcript 
abundance of a gene in that locus. For distant pQTL where this is true, including 
the abundance of the mediator protein/transcript in the pQTL mapping model 
should significantly decrease or abolish the distant pQTL effect—as evidenced 
by a decrease in LOD score. We calculate LOD scores using the ‘double-lod-diff’ 
method in r/intermediate to minimize the effects of missing data in the proteomics 
and RNA-seq data sets. 

Statistical analysis. To assess the significance of the LOD drop for a given can- 
didate mediator on a given distant pQTL, a null distribution of LOD scores was 
estimated by re-running the regression at the peak SNP and including all expressed 
proteins and transcripts outside of the candidate regions as additive covariates. In 
total, this yields mediation LOD scores for 8,050 proteins and 21,454 transcripts for 
each distant pQTL. Mediation LOD scores are then scaled to z-scores, and any can- 
didate with a conservative z-score < —6 is recorded as a potential causal mediator. 
Further, any protein/transcript outside of the pQTL window with a z-score < —6 
is recorded as a potential co-regulated partner of the target protein. We examined 
1,130 distant pQTL and identified at least one candidate protein or transcript medi- 
ator for 743. In total, we found 618 unique protein/transcript mediators, of which 
534 regulated a single protein, 61 regulated two proteins, and 23 regulated three 
or more proteins. Furthermore, 84% of the top candidate protein mediators were 
themselves driven by a local pQTL. 

Analysis of distant pQTL for transcriptional modes of regulation. We 
observed that a small subset of local pQTL and nearly all distant pQTL lacked 
corresponding eQTL. For these proteins, transcript and protein abundance 
appeared to be largely uncoupled (buffered). For the minority of local pQTL 
lacking corresponding local eQTL, we expected that mutations altering protein 
stabilization but not affecting transcript abundance conferred this effect. The 
paucity of distant pQTL with corresponding eQTL is especially puzzling given 
our initial expectation that trans effects on protein abundance would likely stem 
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from transcription factors or chromatin modifying proteins. We detected few 
transcription factors and fewer transcription factor pQTL in our protein data set 
(n= 132 expressed out of 2,243 annotated transcription factors; n =21 out of 132 
transcription factors with pQTL; n=9 local transcription factor pQTL, n= 12 
distant transcription factor pQTL), suggesting (as others have noted*’) that their 
regulation is more evolutionarily constrained and less tolerant of genetic varia- 
tion, or alternatively, that the effects of any individual polymorphism in a tran- 
scription factor may be buffered by other transcriptional components. Results 
from recent large population genetics data sets® support the former explanation, 
and consequently distant effects from transcription factors may resist detection 
by genetic mapping methods and account for the lack of distant pQTL that affect 
both transcript and protein abundance 

Assembly and clustering of the distant pQTL regulatory network. We assembled 
the distant pQTL regulatory network by drawing directed edges to connect each 
trans-pQTL with its primary target protein. Each target protein was then connected 
to co-regulated proteins via directed edges. For purposes of graph assembly, each 
distant pQTL was represented by the protein most likely to be responsible for 
the effects of the QTL as indicated by mediation analysis. To identify clusters of 
co-regulated proteins, the directed network was converted to undirected form and 
subjected to MCL clustering™ using an inflation parameter of 1.5. Each cluster was 
then evaluated for enrichment of PEAM domains**, subcellular localizations”, 
or GO categories” using a hypergeometric test with subsequent multiple testing 
correction”®. P< 0.05 after multiple testing correction was considered indicative 
of enrichment. 

Mapping distant pQTL and co-regulated proteins onto the BioPlex protein 
interaction network. To quantify the extent to which direct physical interac- 
tions could explain distant pQTL regulation, each distant pQTL and its regu- 
lated proteins were associated with their human homologues using official gene 
symbols and mapped to the BioPlex network of human protein interactions*’. 
Any protein that could not be mapped to the BioPlex network, either because 
a human homologue was not known or because the protein did not occur in 
the network, was excluded. Physical interactions connecting the pQTL and its 
co-regulated proteins were counted and compared against the maximum num- 
ber of pairwise connections to calculate the density of physical interactions. 
A binomial model was used to identify sets with unusually high numbers of 
interactions assuming the probability of an interaction occurring between two 
randomly selected proteins in the BioPlex network was 9.42 x 10-4 (the BioPlex 
graph density). P values were adjusted for multiple hypothesis testing using the 
method of Benjamini-Hochberg”™ and those smaller than 0.05 after correction 
were taken to be significant. 
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Extended Data Figure 1 | Proteomic profiling of the eight founder 
strains used to create the DO mouse population. a, A multiplexed 
TMT proteomics method was used to characterize protein expression 

for the eight founder strains with two biological replicates for each strain 
using both sexes. In total, just over 400,000 peptides were quantified 
corresponding to 7,699 proteins. b, Hierarchical clustering and principal 
component analysis determined that the major source of variation in 
protein expression is due to genetic variation among the eight strains and 
the sex within strains. c, K-means clustering and gene set enrichment 


determined that each of the clusters was specifically enriched for 
metabolic pathways, biological process or cellular components. d, Proteins 
representing each of the displayed clusters from c. These proteins have 
specific patterns of expression as exemplified by PCK1, which was highly 
expressed in the NOD strain. Other examples include SCD1, which was 
highly expressed in C57BL/6J and NZO strains (n = 4 mice for each 
founder, 2 male and 2 female, black bars represent median values). Protein 
abundance is shown as the percentage contribution of that mouse’s protein 
levels to its respective 10-plex. 
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Extended Data Figure 2 | The influence of sex and diet on protein and 
transcript abundance. a, Principal component analysis aligns well with 
sex and diet as major experimental contributors of variation in protein 
abundance. b, Female-specific protein abundance profiles for SULT2A1 
and FMO3. c, Male-specific protein abundance profiles for CYP4A12A 
and MUP3. d, e, Diet also resulted in the regulation of many proteins, 


which are represented by proteins such as SCD1 and ACACA that 
increased in abundance and proteins such as HMGCR and SQLE that 
decreased in abundance. f, Principal component analysis aligns well with 
sex and diet as major experimental contributors of variation in transcript 
abundance. g-j. Transcript scatter plots for the proteins in b-e. Transcript 
abundance data were transformed to rank normal scores for plotting. 
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Extended Data Figure 3 | Genetic effects drive much of the observed 
expression variance in the RNA-seq and proteomics data. Liver 
transcript and protein abundance are highly variable in the DO 
population. Among the discovery set (n = 6,707 proteins, 6,647 genes), 
much of this variance can be attributed to one or more experimental 
variables and/or genetic effects. a-c, The experimental covariates sex 
and diet influence many transcripts and proteins in an additive manner, 
however, the interaction of sex and diet does not seem to affect many 
genes. The effects from sex and diet are not biased towards one molecular 
species—that is, similar numbers of transcripts and proteins are similarly 
affected by these experimental variables. Genetic variation underlies 
many of the most variable transcripts and proteins. d, e, Local genetic 
variation in particular is a strong driver of expression variation for many 


eQTL Location 


eQTL Location 


genes, while distant genetic effects are observed but more subtle. Among 
the discovery set, we observe more and larger genetic effects (both local 
and distant) on transcript abundance than protein abundance. f, For 

most transcripts and proteins detected in this study, expression variation 
is minimal, cannot be attributed to a known experimental or genetic 
variable, and is plotted as noise. g, pQTL map for all 6,707 proteins tested 
from genetic linkage analysis. h, i, QTL mapping identified the genetic loci 
that underlie variability in transcript abundance (eQTL). For the discovery 
set of transcripts with detected proteins and the larger set of all expressed 
genes, the location of the eQTL is plotted on the x axis and the location of 
the controlled gene is plotted on the y axis. Most genetic effects are local 
and map to the same location as the gene, as evidenced by the prominent 
diagonal line in both maps. 
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N = 6,707 Total Proteins 
Extended Data Figure 4 | Replication rates for eQTL are highly 
correlated with effect size, and local eQTL replicate at higher rates 
than distant eQTL. a, To assess replication of eQTL, an independent set 
of 192 DO liver RNA-seq samples was analysed (‘replication set’) and 
compared to the discovery set. A total of 16,839 genes were expressed in 
half or more samples in both data sets. For each gene, the most significant 
proximal locus (within + 10 Mb of gene) and distant locus (located on a 
different chromosome from the gene) were identified from the discovery 
set—LOD scores at these loci are plotted on the x axis (local in red; distant 
in blue). Next, the most significant loci within a 10-Mb window flanking 
the local and distant loci from the discovery set were identified in the 
replication set and plotted on the y axis. LOD scores are highly correlated 
at these peak loci (local Pearson r= 0.91; distant r= 0.84). b, For the 
core set of 6,707 proteins (6,647 gene ids), pQTL and eQTL overlap were 
compared at multiple genome-wide P value thresholds from 0.01 to 0.2. 
Again, one maximum proximal locus and one maximum distant locus 
were identified for each gene/protein, and recorded if it met the P value 
cut off. Local pQTL exhibit high overlap with both the discovery eQTL 
set and replication eQTL set, regardless of P value threshold (67-80%). 
Distant pQTL exhibit slightly higher overlap with eQTL at the most 
stringent P value cut off, however, overlap is consistently low for distant 
pQTL (<1-2%). Local eQTL overlap well with the replication eQTL set 
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regardless of P value threshold (75-77%). Distant eQTL replicate poorly 
overall (3-31%), but overlap rate is highest (31%) at the most stringent 

P value threshold, suggesting that larger sample sizes will be required to 
fully and accurately characterize distant effects on gene expression. c, The 
maximum proximal locus and distant locus were identified for each of the 
6,707 proteins and transcripts, and the cumulative distribution of their 
LOD scores is plotted (blue = proteins, green = transcripts). LOD score 

is plotted on the x axis, and the proportion of total QTL is plotted on the 
y axis. Local eQTL as a group exhibit higher LOD scores (consistent with 
higher effect sizes) than local pQTL (ninetieth percentile LOD = 23.9 for 
local eQTL, 13.6 for pQTL), while distant eQTL and pQTL are of similar 
scale (ninetieth percentile LOD =7.9 for distant eQTL, 8.2 for distant 
pQTL). d, Comparison of pQTL from the discovery set to eQTL from the 
discovery set (left set of Venn diagrams) and eQTL from the replication 
set (right). As expected given that they derive from the same samples, 
local pQTL and eQTL overlap is observed to be higher in the discovery 
set (1,392 out of 1,736 = 80%), however, local pQTL still overlap well with 
eQTL from the replication set (1,273 out of 1,736 = 73%). Distant pQTL 
overlap poorly with both eQTL sets (9 out of 1,048 in discovery set); 8 out 
of 1,048 in replication set), however, 6 of 9 distant pQTL that do overlap 
with eQTL in the discovery set are also identified as overlapping in the 
replication set. 
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Extended Data Figure 5 | BIC model selection reveals transcriptional effect on protein abundance is detected, two are transcriptional in nature 
mechanisms driving most local pQTL and post-transcriptional (L1, L2; D1, D2); the QTL effect on protein abundance is conferred at least 
mechanisms underlying most distant pQTL. We identified the local and partially through the transcript. The remaining three genetic models are 
distant QTL with the maximum LOD score (regardless of significance) post-transcriptional (L3-5; D3-5); the QTL effect on protein abundance is 


for each of the 6,707 proteins, and used BIC to assess eight models linking not mediated through the transcript. The transcriptional L1 and L2 models 
QTL genotype to transcript and protein abundance. Most proteins are not are identified as the best models for most local pQTL, while the post- 
affected by the local or distant QTL, and fall in one of the three groups transcriptional D3 and D4 models are optimal for most distant pQTL. 
below outlined by the dotted line. Among the five models where a QTL 
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Extended Data Figure 6 | Examples of local pQTL that are due to 

an underlying eQTL and those that are due to post-transcriptional 
mechanisms. a, The protein DHTKD1 contained a local acting eQTL 

and pQTL, which was associated with increased transcript and protein 
abundance derived from 129S1/SvImJ, CAST/EiJ, PWK/PhJ and WSB/EiJ 
strains. Mice were divided into three groups depending on whether or not 
their genomes contained 0, 1 or 2 of the alleles found to be associated with 
the pQTL. These increases in protein abundance were further validated 
using the proteomic analysis of the founder strains. b, ¢, Similarly, Ces2h 
and Pipox had both a local acting eQTL and pQTL that could be associated 
with specific strains (CAST/EiJ, PWK/PhJ and WSB/EiJ). These protein 
abundance measurements were further validated using the founder strains 


data set. d, e, Alternatively, 10% of the genes had local pQTL but lacked 
local eQTLs, which is evident in proteins such as ENTPD5 and OMA1. 
The founder allele expression patterns inferred at the pQTL were validated 
by protein abundance measurements in the founder strains, which 

could be explained CAST/EiJ specific missense mutations in both genes. 

f, Likewise, Lars2 also contained a pQTL that had no observable eQTL that 
showed a decrease in protein abundance in the 129S1/SvImJ, CAST/EiJ, 
PWK/PhJ and WSB/EiJ strains. Genome sequencing determined that these 
strains share four missense mutations (*P < 0.01 using a Student's t-test; 
for founder strains, n = 4 mice for each founder, 2 male and 2 female, error 
bars represent s.d.). 
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Extended Data Figure 7 | The causal relationship between genetic not be previously documented such as UPB1-MTR, FOCAD-AVEN, 
variation and protein expression was determined for over 700 AGPAT9-CHP1 and ANXA1-ARAD 1A. i-l, Protein associations were 
proteins as inferred by mediation analysis. a-d, Many of the causal also identified for multimeric complexes such as ECSIT-NDUFAF1- 


relationships between proteins have been previously documented such as TMEM126B, DMXL2-ROGDI-WDR7, PIGU-PIGT-PIGS and 
the associations between SNX7-SNX4, PGAM1-PGAM2, LRRFIP1-FLII IKBKAP-ELP2-ELP3. 
and PPIF-PPIE. e-h, In addition, many of the protein associations had 
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Extended Data Figure 8 | Mediation analysis for CCT complex members 
details the effects of a QTL in Cct6a on protein abundance through 
post-transcriptional protein buffering. a-f, Mediation analysis for each 
of the Cct complex identifies Cct6a as the causal intermediate. A local 
QTL for Cct6a affects transcript and protein abundance, and CCT6A 


abundance sets the abundance of other CCT proteins regardless of 
variation in their transcripts. For each of the complex members tested, 
all other complex members are confirmed to be co-regulated providing 
additional supporting evidence for stoichiometric buffering. 
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Extended Data Figure 9 | See next page for caption. 


Co-regulated 
Protein Pairs 


Density: 1.00 Co-regulated 
Protein 


P-Value: 1.87e 


© 2016 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


Extended Data Figure 9 | Distant pQTL and co-regulated proteins 


frequently correspond to complexes of physically interacting proteins. 


a, Distant pQTL and co-regulated proteins assemble to form a regulatory 
network, which is defined by protein clusters with distinct topologies. 

A total of 3,938 proteins/QTL are linked by 5,794 associations. Distant 
pQTL are depicted as purple arrows pointing from the inferred causal 
protein to its regulated pair. Co-regulated proteins are connected with 
green arrows emanating from the primary target protein. b, MCL 
clustering decomposes the distant pQTL network into 671 clusters. 
Cluster size varies considerably, although most clusters contain fewer 
than 20 proteins. c, Clusters extracted from the distant pQTL network 
frequently associate proteins with shared biological functions. More than 
half of clusters are enriched for at least one GO category, as depicted in 
the bar chart above. d-f, Three selected clusters of distant pQTL and co- 
regulated proteins. g, To understand the relationship between the distant 
pQTL associations and protein interactions, each distant pQTL and its 
co-regulated proteins were mapped to their human homologues in the 
BioPlex network of human protein interactions. To assess the tendency 


for these co-regulated proteins to cluster together, the median graph 
distance separating all pairs of co-regulated proteins was determined. The 
distribution of median distances observed for equal numbers of randomly 
selected proteins was also determined and used to assign a Z-score to each 
distant pQTL and its co-regulated proteins. h, Histogram depicting the 
Z-score distribution for distant pQTL and co-regulated proteins. Z-scores 
below —2.5 (highlighted in red) indicated that co-regulated proteins 

were unusually close within the BioPlex network. i-I, Selected distant 
pQTL and co-regulated proteins, mapped onto the BioPlex network of 
protein interactions. All shortest paths connecting distant pQTL and their 
regulated proteins have been extracted from the BioPlex network and 
displayed. Proteins inferred to be responsible for each QTL are purple, 
while primary regulated proteins are red and secondary co-regulated 
proteins are green. Grey circles represent neighbouring proteins in the 
BioPlex network that were not found to be co-regulated. Grey edges 
indicate BioPlex interactions, while Blue edges denote co-regulation 
uncovered from trans-QTL analysis. 
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Extended Data Figure 10 | Comparison of protein abundance in the 
DO and founder strains reveals a positive correlation between pQTL 
significance and predictive power. a, b, For all detected liver pQTL in the 
DO population, founder strain allelic contributions were derived from the 
mapping model and compared to protein abundance measured directly 
from the eight founder strains. Pearson correlations are plotted against 
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b Distant pQTL 
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the LOD score of the pQTL for both local and distant pQTL. Predictive 
power tracks well with pQTL significance. Local pQTL tend to be more 
significant and yield higher predictive power than distant pQTL, however 
highly significant distant pQTL (>10 LOD) have comparable predictive 
power to local pQTL of similar significance. 
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Crystal structure of the epithelial 
calcium channel TRPV6 


Kei Saotome!*, Appu K. Singh!*, Maria V. Yelshanskaya! & Alexander I. Sobolevsky! 


Precise regulation of calcium homeostasis is essential for many physiological functions. The Ca?+-selective transient 
receptor potential (TRP) channels TRPV5 and TRPV6 play vital roles in calcium homeostasis as Ca?* uptake channels in 
epithelial tissues. Detailed structural bases for their assembly and Ca*+ permeation remain obscure. Here we report the 
crystal structure of rat TRPV6 at 3.25 A resolution. The overall architecture of TRPV6 reveals shared and unique features 
compared with other TRP channels. Intracellular domains engage in extensive interactions to form an intracellular ‘skirt’ 
involved in allosteric modulation. In the K+ channel-like transmembrane domain, Ca** selectivity is determined by direct 
coordination of Ca?* by a ring of aspartate side chains in the selectivity filter. On the basis of crystallographically identified 
cation-binding sites at the pore axis and extracellular vestibule, we propose a Ca** permeation mechanism. Our results 
provide a structural foundation for understanding the regulation of epithelial Ca** uptake and its role in pathophysiology. 


The TRP channels are a superfamily of cation-permeable ion channels 
that are widely known for their role as transducers of sensory modal- 
ities'. TRPV5 and TRPV6 are TRP channels that are uniquely selec- 
tive for Ca?* (permeability ratio Pca/PNa > 100) (ref. 2). They have not 
been reported to be responsive to temperature, tastants or odours, but 
the mechanosensitive properties of TRP V6 appear to be important 
for the formation of microvilli*. TRPV5 and TRPV6 belong to the 
vanilloid subfamily of TRP channels, share ~75% sequence identity 
and are involved in the transport of calcium through epithelial cell 
membranes’. Knockout of TRPV6 in mice leads to various phenotypes 
linked to impaired Ca?* homeostasis, including defective intestinal 
Ca’* absorption, lower body weight, impaired fertility and derma- 
titis’"°. Altered TRPV6 expression has also been shown in various 
transgenic mouse models of human diseases’, including Crohn’s and 
kidney stone diseases. In addition, TRPV6 is implicated in the devel- 
opment and progression of numerous forms of cancer, and its overex- 
pression pattern correlates with the aggressiveness of the disease™””. 
Accordingly, TRPV6 has emerged as a target for diagnosing and treat- 
ing various carcinomas!}?, 

Structurally, TRPV5 and TRPV6 represent homo- or heteromeric 
assemblies of four subunits!*, each containing a central Kt-channel-like 
transmembrane domain that is flanked by intracellular amino (N)- and 
carboxy (C)-terminal domains. The overall architecture and potential 
gating mechanisms of TRP channels have recently been illuminated by 
cryo-electron microscopy structures of TRPV1 (refs 14, 15), TRPV2 
(ref. 16) and TRPA] (ref. 17). However, the absence of structural bases 
for the unique physiological properties of TRPV5 and TRPV6 moti- 
vated us to study these epithelial Ca** channels. 


Structure determination 

We screened various orthologues of TRPV5 and TRPV6 and discov- 
ered rat TRPV6 as a promising candidate for our structural studies. We 
modified the 727-residue wild-type rat TRPV6 polypeptide to create 
the crystallization construct TRPV6,:ys (see Methods). Experiments 
with the fluorescent Ca?* indicator Fura-2 AM show that cells express- 
ing TRPV 6 cys exhibit Ca?* permeability similar to wild type (Extended 
Data Fig. 1). 


The best crystals of TRPV6cryst diffracted to 3.25 A resolution. We 
solved the TRPV6,:y<¢ structure by molecular replacement, and the 
electron density map (Extended Data Fig. 2) was readily interpretable 
for most of the polypeptide (see Methods). Sequence registry was 
aided by anomalous difference Fourier maps highlighting natural 
sulfur atoms of cysteines and methionines, and selenium atoms 
in protein with selenomethionines substituted for methionines 
(Extended Data Fig. 3). The resulting model of TRP V6 crys: was refined 
to good crystallographic statistics and stereochemistry (Extended 
Data Table 1). 


Architecture and domain organization 

The four-fold symmetrical structure of TRP V6c:yst (Fig. 1) contains two 
main components: a transmembrane domain with a central ion channel 
pore, anda ~70 A-tall and ~110 A-wide intracellular skirt where four 
subunits constitute walls enclosing a ~50 A x 50 A-wide cavity under- 
neath the ion channel. Like TRPV1 (ref. 14) (Extended Data Fig. 4) 
and TRPV2 (ref. 16), the intracellular domains of a single TRPV6cryst 
subunit contain an ankyrin repeat domain with six ankyrin repeats, 
followed by a linker domain that includes a 3-hairpin (composed 
of 8-strands 31 and 82) and a helix-turn-helix motif resembling a 
seventh ankyrin repeat, and the pre-S1 helix, which connects the linker 
domain to the transmembrane domain (Fig. 1d-f and Extended Data 
Fig. 4). Also similar to TRPV 1/2, a six-residue stretch at the C terminus 
constitutes a 3-strand (3) that tethers to the 8-hairpin in the linker 
domain to create a three-stranded (3-sheet. In addition to the conserved 
domains, TRPV6,:ys_ also includes an N-terminal helix and C-terminal 
hook, which pack against each other to form an intersubunit interface 
along the corners of the intracellular skirt. 

Similar to other TRP channels!*!*!”, the transmembrane domain 
of TRPV6cryst crudely resembles voltage-gated K* (ref. 18) or Nat 
(ref. 19) channels and includes six transmembrane helices (S1-S6) and 
a pore loop (P-loop) between S5 and S6. The first four transmembrane 
helices form a bundle to constitute the $1-S4 domain. The packing of 
aromatic side chains in $1-S4 rigidifies the helical bundle conformation 
(Extended Data Fig. 4c), suggesting that this domain remains relatively 
static during gating. The linker between the S1-S4 domain and pore 
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Figure 1 | Architecture and domain organization of TRPV6.rys. a—c, Side (a), bottom (b) and top (c) views of the TRPV6crys tetramer, with each 
subunit shown in a different colour. d, Domain organization diagram of the TRPV6 subunit. e, f, Two views of the TRPV6.,ys subunit, with domains 


coloured as in d. 


domain is unstructured, which is a marked contrast from other TRP 
channels, in which it assumes a helical conformation and mediates 
interdomain interactions'*!*!”, Following $6 is the amphipathic TRP 
helix, which runs parallel to the membrane and interacts with intra- 
cellular soluble domains in a manner analogous to TRPV1, TRPV2 and 
TRPAI (refs 14, 16, 17). 

Although the overall domain organization of TRPV6qys resembles 
TRPV1/2 (refs 14, 16) and, to a lesser degree, TRPA1 (ref. 17), electron 
density for the linker between S6 and TRP helix (Extended Data Fig. 2f) 
and disulfide crosslink experiments (Extended Data Fig. 5a—c) imply a 
unique non-swapped transmembrane domain arrangement in which 
the S1-S4 domain and pore domain of the same protomer are packed 
against each other. While this unique domain arrangement could have 
profound implications, we present this aspect of the TRPV6qrys: model 
cautiously because of the absence of interpretable density for the S4—S5 
linker. 


Assembly and subunit interfaces 

Assembly of TRPV6.ryst is mediated by multiple interdomain interfaces 
(Fig. 2). Close packing of S5 against S4 and S1 of the adjacent $1-S4 
domain immobilizes the pore module with respect to the S1-S4 domain 
(Fig. 2a), a trait that is reminiscent of the Slo2.2 KT channel”? and 
distinct from voltage-gated channels!®. Further, the $1-S2 extracellular 
loop contacts the S5-P and P-S6 loops (Fig. 2a). This interaction hints 
at a structural basis for the regulation of TRPV5 and TRPV6 function 
by the 8-glucuronidase klotho, which modulates channel activity by 
modifying the conserved N-linked glycosylation site”' located in the 
middle of this loop (N357 in TRPV6qryst). 


The intracellular domains of TRPV5 and TRPV6 have been impli- 
cated in tetrameric assembly”, trafficking” and regulation of channel 
activity by the Ca?+ sensor calmodulin”**°. The structure of TRPV6cryst 
reveals that numerous non-contiguous intracellular domains engage in 
extensive inter- and intrasubunit interactions (Fig. 2c). At the centre 
of these interactions is the N-terminal helix, which is positioned as a 
pillar along the corners of the intracellular skirt. Putative hydrogen 
bonds and salt bridges involving D34 stabilize the interaction between 
the N-terminal helix and three-stranded 6-sheet. Notably, mutation 
of the equivalent D34 to alanine abolished Ca** uptake function in 
TRPV5 (ref. 23), suggesting this interaction’s functional importance. 
The N-terminal helix also forms hydrophobic and hydrogen bonding 
interactions with the C-terminal hook and pre-S1 helix from an adja- 
cent subunit. Since it is a hub for domain interactions, endogenous or 
exogenous factors could allosterically modulate channel activity by tar- 
geting the N-terminal helix. Interestingly, we observed a robust cylin- 
drical density at the intersubunit interface formed by the N-terminal 
helix, ankyrin repeat domain and three-stranded 3-sheet (Extended 
Data Fig. 5d-h). We have tentatively attributed this density to desthio- 
biotin (DTB), which was included as an eluent in the TRPV6 crys affinity 
purification procedure (see Methods). 


Ion-conducting pore 

The extracellular portion of the TRPV6.rys ion-conducting pore is 
formed by extracellular loops connecting the P-loop helix to S5 and S6, 
while the rest of the ion conduction pathway is formed entirely by the 
S6 helices (Fig. 3). Such pore architecture is conserved over the entire 
family of tetrameric ion channels (Extended Data Fig. 6). 
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The region connecting S5 and Sé6 contains eight acidic residues per 
protomer, four of which face the ion conduction pathway to produce 
a highly electronegative ‘mouth to the pore (Fig. 3a—c). Below this 
extracellular vestibule is a four-residue selectivity filter ??°TIID*“") 
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Figure 2 | Domain interfaces. a, Transmembrane helices $4 and S5 and 
extracellular loops $1-S2, S5-P and P-S6 contribute to interfaces between 
the S1-S4 domain and pore domain. b, Side view of the TRPV6.ryst 
tetramer with domains coloured differently. Boxes indicate domain 
interfaces expanded in a and c. ¢, Interfaces between soluble domains. 
Residues at domain interfaces in a and c are shown in stick representation, 
with potential hydrogen bonds and electrostatic interactions shown as 
dashed lines. The predicted N-linked glycosylation site conserved in 
TRPV6 channels, N357 in TRPV6qyst; is labelled red in a. 


(Fig. 3d-f). The side chains of D541, which have previously been iden- 
tified as critical for Ca’~ selectivity, permeation and voltage-dependent 
Mg?" block’, protrude towards the pore central axis to produce a min- 
imum interatomic distance of 4.6 A (Fig. 3f-g) at the upper tip of the 
selectivity filter. Three phenylalanine residues (F530, F533 and F536) 
in the pore helix, which are conserved in TRPV5-6, but only one of 
which is conserved in TRPV 1-4 (Extended Data Fig. 7), may restrict 
its dynamics. A relatively static outer pore domain could reflect a key 
difference between TRPV5/6 and other TRPV channels, which gate in 
response to various stimuli and thus should display a higher degree of 
structural plasticity, as exemplified by toxin- and capsaicin-induced 
conformational changes in TRPV1 (ref. 15). 

Below the selectivity filter, the pore widens into a large, mainly 
hydrophobic cavity (Fig. 3e). Lateral pore portals (Fig. 3a, b) may pro- 
vide hydrophobic access to this cavity for small molecules or lipids, 
similar to voltage gated Nat channels!”. The large diameter of the 
hydrophobic cavity (~13 A) can easily accommodate a fully hydrated 
calcium ion, which has an effective diameter of 8-10 A. The S6 helices 
cross at the intracellular portion of the channel, where the M577 side 
chains form the narrow constriction (5.1 A diameter) and define 
the lower gate (Figs 3d-f), similar to TRPV2 (ref. 16). Importantly, 
anomalous diffraction from crystals grown with selenomethionine- 
labelled protein showed a robust signal (Fig. 3h and Extended Data 
Fig. 3c), confirming that M577 side chains occlude the pore. Despite 
high sequence conservation in this region (Extended Data Fig. 7), in 
TRPVI1 the equivalent residue to TRPV6 M577 points away from the 
pore axis (Extended Data Fig. 6a). 


Cation- binding sites 

Previous research has resulted in the proposal that TRPV5 and TRP V6 
achieve their exceptional Ca** selectivity through binding of Ca?* to 
the selectivity filter’. Indeed, we observed a strong 2F, — F, density 
consistent with a bound ion at the central pore axis, surrounded closely 
by the carbonyl oxygens of D541 side chains (Extended Data Fig. 2e). 
Since the pore diameter here (4.6 A, measured between centres of 
opposing oxygen atoms) is large enough to accommodate a dehydrated 
calcium ion (typical Ca2*—oxygen distance is ~2.4 A), we contend that 
the selectivity filter is captured in a Ca?*-conducting state. To further 
resolve cation-binding sites in the pore, we co-crystallized TRPV6cryst 
with Ca**, Ba?* or Gd?*, which have various permeation and channel- 
blocking properties”” (Extended Data Fig. 1) and collected X-ray dif- 
fraction data to locate anomalous difference peaks. 

The anomalous difference peaks suggest the presence of four types of 
cation-binding site in the TRPV6.,ys channel pore (Fig. 4). Notably, two 
of these (sites 1 and 2) have locations approximately equivalent to Ca?* 
sites in the genetically engineered Ca”*-selective channel CayAb*’, but 
none of them overlap with the putative Ca?* site in Cayl.1 (ref. 29) 
(Extended Data Fig. 6m-o). For Ba?* and Gd**, four symmetry- 
related peaks were observed in the TRPV6,ryst outer vestibule, in the 
vicinity of D517, E518 and D547 (Fig. 4c-f). Interestingly, the Ba?* 
and Gd** sites occupy distinct locations, probably because of differ- 
ence in charge density. Although these signals were not observed for 
TRPV6cryst co-crystallized with Ca?* (Fig. 4a—b), presumably because 
of lower affinity, reduced occupancy or weaker anomalous signal, we 
speculate that the highly electronegative outer vestibule is involved in 
the general recruitment of cations towards the extracellular vestibule of 
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Figure 3 | Permeation pathway. a-c, Side (a), central slice (b) and top 
(c) views of TRPV6qryst structure in surface representation, coloured by 
electrostatic potential. d, Ribbon diagram of the TRPV6,,ys: tetramer, 
with ion conduction pathway shown in cyan. e, Expanded view of the 
TRPV6cryst pore, with front and back subunits excluded for clarity. Acidic 
side chains in the extracellular vestibule and pore-lining side chains are 
shown as sticks. f, Radius of the pore calculated using HOLE. D541 and 
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sites 


Figure 4 | Cation-binding sites in the TRPV6.ryst pore. Side (a, c, e) and 
top (b, d, f) views of the TRPV6.1yst pore, with residues important for 
cation binding shown in stick representation. Front and back subunits 

in a, cand e are removed for clarity. Green, blue and pink mesh shows 
anomalous difference electron density for Ca** (a, b, 38-4.59 A, 2.70), 
Ba?" (c, d, 38-4.59 A, 3.50) and Gd** (e, f, 384.59 A, 70), and ions 

are shown as spheres of the corresponding colour. Purple mesh shows 
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M577 form narrow constrictions at the selectivity filter and intracellular 
gate, respectively. g, h, Top views of narrow constrictions formed 

by D541 (g) and M577 (h). Inh, blue and pink mesh shows electron 
density for M577 (2F, — F., 45-3.25 A, 1.0c) and anomalous difference 
electron density from selenomethione-labelled crystal (30-5.00 A, 3.0c), 
respectively. 
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simulated-annealing F, — F, electron density maps contoured at 4c for 
Ca?* (50-3.65 A), 30 for Ba?+ (50-3.85 A) and 3.50 for Gd** (50-3.80 A). 
The amplitudes of the anomalous peaks are listed in Extended Data Fig. 8c. 
D547 and E518 side chains are apparently involved in coordination of 
Ba? ions at the recruitment sites. The Gd** recruitment sites are distinct 
from Ba** and apparently involve coordination by D517, E518 and D547 
side chains. 
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Figure 5 | Calcium permeation mechanism. a, Side view of TRPV6cryst 
pore, with front and back subunits removed for clarity. Residues that 
surround or contribute to cation-binding sites are shown as sticks, and 
Ca?* ions at sites 1, 2 and 3 are shown as green spheres. b-d, Top views 

of Ca** ions at sites 1 (b), 2 (c) and 3 (d), with nearby residues shown as 
sticks. The interatomic distances illustrated by dashed lines suggest that 
Ca?" is directly coordinated by D541 side chains at site 1, while a hydrated 
Ca?* ion indirectly interacts with the pore at sites 2 and 3. e, Schematic 


the TRPV6 channel. Lower affinity of the recruitment sites compared 
with the main binding site in the centre of the pore for Gd** is con- 
sistent with the results of isothermal titration calorimetry experiments 
(Extended Data Fig. 8a, b). 

The strongest anomalous difference peaks for Ca’* and Gd** were 
observed along the central pore axis at or near the same plane as 
D541 side chains (Fig. 4a, b, e, f), indicating that this locus constitutes 
the main cation-binding site (site 1). The cation-oxygen distance of 
2.4A (Fig. 5b) matches the reported average Ca**—oxygen distance 
calculated from crystal structures of various classes of Ca**-binding 
proteins*’. This minimal interatomic distance suggests that the carbox- 
ylate oxygens of D541 directly coordinate an at least partly dehydrated 
Ca?* ion at this site. Similarly, structural studies of the hexameric 
Ca?* release-activated channel Orai suggest that Ca’” selectivity is 
achieved by direct coordination of Ca?* by a ring of glutamate resi- 
dues at the extracellular entrance to the pore?! By contrast, in CayAb, 
the permeant Ca** ion indirectly interacts with the pore through water 
molecules”®. The presence of a robust Gd** signal at site 1 shows that 
trivalents can bind at D541 as well (Fig. 4e, f). 

For Ca? and Ba’*, an additional anomalous difference signal is 
observed at the centre of the pore, 6-8 A below site 1, between the back- 
bone carbonyls and side-chain hydroxyl groups of T538 (site 2). The 
greater Ca”+/Ba*+—oxygen distance at site 2 (~4 A, Figs 4a, c and 5c) 
indicates that the cation is equatorially hydrated at this location. 
Although the chemical environment of site 2 suggests that it binds 
cations at lower affinity than site 1, the Ba** signal is stronger at this 
site (Extended Data Fig. 8c). The different relative anomalous peak 
intensities of sites 1 and 2 for Ca?+ and Ba’", as well as their slightly 
different positions at site 1, may arise from the greater size of Ba?t 
(~3 A diameter) than Ca?+ (~2 A diameter). This observation implies 
that the TRPV6 selectivity filter discriminates ions on the basis of size 
as well as charge. 

Anomalous difference peaks were observed for Ca?* and Ba** 
6.8 A below site 2 in the centre of the hydrophobic cavity, at the level 
of M569 (site 3) (Figs 4a, c and 5a, d). For Ca**, the anomalous peak 
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representation of various Ca?* occupancy states in TRPV6. Presumed 
lower energy states are shown in yellow, and most probable transitions 
are highlighted with bold arrows. Occupancy states in which site 1 is 
vacant (shown in grey) are likely only to be transiently populated, owing 
to electrostatic repulsion of D541 side chains. Sufficiently large distances 
between sites 1, 2 and 3 suggest that electrostatic repulsion between Ca** 
ions does not preclude simultaneous binding at all three sites. 


at site 3 is less robust (Extended Data Fig. 8c), presumably because of 
weaker anomalous diffraction properties. The signal at site 3 suggests 
that cations bound here are ordered by water molecules, which can be 
held in place by weak hydrogen bonding interactions and pore helix 
dipoles pointing their partial negative charges towards the centre of 
the hydrophobic cavity. 


Mechanism of ion permeation 

The pore architecture and locations of cation-binding sites in the 
TRPV6cryst Structure (Fig. 5a—d) illuminate a potential calcium 
permeation mechanism (Fig. 5e). The close proximity of carboxylate 
side chains at site 1 suggests that, in the present pore conformation, 
the absence of a bound Ca’* ion would be energetically unfavoura- 
ble because of charge repulsion between D541 side chains. Thus, it 
is likely that a Ca?" ion is, in effect, constitutively bound at site 1 and 
removal of a Ca”+ ion from site 1 would require immediate replace- 
ment with another Ca”* ion, necessitating a ‘knock-off’ mechanism of 
permeation similar to the genetically engineered Ca”*-selective chan- 
nel CayAb*®. Given the large energetic barrier of displacing a Ca”* 
ion at site 1, a substantially high local concentration of Ca?* would 
be necessary for permeation to proceed at physiological membrane 
voltages. Recruitment sites in the highly electronegative extracellular 
vestibule might serve this purpose. 

As direct coordination by aspartate side chains suggests that site 1 
is the highest affinity site for Ca”+ in TRPV6 channel pore, knock-off 
from site 1 is likely to be the rate-limiting step for Ca** permeation. 
After the Ca”* ion is knocked off site 1, it moves towards site 2, where 
it is coordinated through its hydration shell by the backbone carbonyls 
and sidechain hydroxyls of T538. In CayAb*’, Ca?” also binds in the 
middle of the selectivity filter, at a locus between site 1 and site 2 of 
TRPV6cryst (Extended Data Fig. 6m, n). Although we found no crystal- 
lographic evidence for Ca”* bound at an equivalent site in TRPV6cryst» 
it is plausible that such a site is occupied transiently during stepwise 
Ca** permeation. Whether a knock-off is necessary for the Ca** ion 
to traverse from site 2 to site 3 is unclear, as electrostatic repulsion 
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between Ca?" ions at site 1 and site 2 (and possibly, the aforementioned 
site between sites 1 and 2) may contribute a driving force. At site 3, the 
Ca** ion is poised to enter the cell. Since the lower gate is closed in the 
current TRPV6.:ys structure, further studies are necessary to elucidate 
whether its opening affects cation binding in the pore. 

Previous observations have suggested that, in addition to Ca’*, 
TRPV6 is permeable to other divalents (with ion permeation sequence 
Cat > Sr?* ~ Ba? > Mn?")4 and weakly to trivalents (La** and 
Gd?*)?’ as well. The anomalous difference peaks for Ba** and Gd?* 
indicate that the permeation mechanism of other cations differs from 
Ca?* permeation to varying degrees. Ba’, for example, apparently 
has a stronger anomalous electron density at site 2 (Extended Data 
Fig. 8c), which suggests a higher affinity for that site than site 1. Thus, 
knock-off of Ba?* from site 2 to site 3 may be slower and more rate- 
limiting than knock-off from site 1 to site 2. Larger and more positively 
charged ions such as Gd** may permeate differently from divalent cati- 
ons, since their high charge density may preclude simultaneous binding 
at sites 1 and 2. Nevertheless, trivalents probably block divalents from 
permeating by virtue of their strong positive charge, which results in 
higher affinity binding at site 1. Likewise, Ca”* and Mg’* probably 
block monovalent currents” through an analogous mechanism. Further 
studies will be necessary to elucidate the intricate details of cation per- 
meation and selectivity in epithelial Ca?* channels. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized. The investigators were not blinded to allocation during 
experiments and outcome assessment. 

Constructs. Using fluorescence-detection size-exclusion chromatography 
(FSEC)*”, we screened numerous TRPV5 and TRPV6 orthologues fused to 
enhanced green fluorescent protein (eGFP)** at the C terminus and identified rat 
TRPV6 (GenBank EDM15484.1) as the best candidate for crystallographic trials. 
The fortuitous spontaneous mutation L495Q generated during gene synthesis was 
found to increase expression level of rat TRPV6. C-terminal truncation mutants 
of rat TRPV6-L495Q produced crystals in the C222 space group that diffracted to 
~6A resolution. On the basis of an initial low-resolution molecular replacement 
solution, we designed numerous mutations aimed at improving crystal packing, 
including individual substitutions of surface residues, fusions with soluble pro- 
tein partners and flexible loop deletions. Incorporation of the surface residue 
mutations L92N and M96Q helped improve the resolution limit of crystals in 
the C222 space group to ~4.0 A. Further screening of surface residue mutations 
yielded the amino-acid substitution 162Y, which facilitated crystallization in the 
P42,2 space group and improved diffraction resolution to 3.25 A. Inspection of 
protein-mediated crystal contacts in this crystal form shows that cation-7 and/or 
hydrogen-bonding interactions involving the side chains of 162Y, K63, K66 and F67 
might have permitted crystallization in the P42,2 space group and contributed to 
the improved resolution (Extended Data Fig. 9). The final construct, TRPV6.ryst, 
comprises residues 1-668 and contains the point mutations 162Y, L92N, M96Q 
and L495Q. 

Expression and purification. TRPV6,:ys: was introduced into a pEG BacMam 
vector*? with C-terminal thrombin cleavage site (LVPRG) followed by eGFP 
and streptavidin affinity tag (WSHPQFEK). Baculovirus was made in Sf9 cells 
(Thermo Fisher Scientific, mycoplasma test negative). For large-scale expression, 
suspension-adapted HEK 2935 cells lacking N-acetyl-glucosaminyltransferase I 
(GnTI) (ATCC, mycoplasma test negative) were grown in Freestyle 293 media (Life 
Technologies) supplemented with 2% FBS at 37 °C in the presence of 5% CO>. The 
culture was transduced with P2 baculovirus once cells reached a density of 2.5 x 10° 
to 3.5 x 10° per millilitre. After 8-12h, 10 mM sodium butyrate was added and 
the temperature was changed to 30°C. Cells were harvested 48-72 h after trans- 
duction and resuspended in a buffer containing 150 mM NaCl, 20 mM Tris-HCl 
pH 8.0, 1 mM 8-mercaptoethanol (ME), 0.8 1M aprotinin, 2,.g ml“! leupeptin, 
2mM pepstatin A and 1 mM phenylmethysulfonyl fluoride (PMSF). The cells were 
disrupted using a Misonix Sonicator (12 x 15s, power level 7), and the resulting 
homogenate was clarified using a Sorval centrifuge at 9,900g for 15 min. Crude 
membranes were collected by ultracentrifugation for 1h in a Beckman Ti45 rotor 
at 186,000g. The membranes were mechanically homogenized and subsequently 
solubilized for 2-4h in a buffer containing 150 mM NaCl, 20 mM Tris-HCl pH 8.0, 
1mM BME, 20 mM n-dodecyl-8-p-maltopyranoside (DDM), 0.8 |1M aprotinin, 
2g ml“! leupeptin, 2mM pepstatin A and 1 mM PMSF. After insoluble material 
was removed by ultracentrifugation, streptavidin-linked resin was added to the 
supernatant and rotated for 4-16 h. Resin was washed with 10 column volumes of 
wash buffer containing 150 mM NaCl, 20mM Tris pH 8.0, 1mM BME and 1mM 
DDM, and the protein was eluted using wash buffer supplemented with 2.5mM 
p-desthiobiotin. The eluted fusion protein was concentrated to ~1.0mg ml! 
and digested with thrombin at a mass ratio of 1:100 (thrombin:protein) for 1.5h 
at 22°C. The digested protein was concentrated and injected into a Superose 
6 column equilibrated in a buffer composed of 150mM NaCl, 20 mM Tris-HCl pH 
8.0, 1mM BME and 0.5mM DDM. Tris(2-carboxyethyl)phosphine (TCEP; 10 mM) 
was added to fractions with elution time corresponding to the tetrameric channel, 
and protein was concentrated to 2.5-3.0 mg ml! using a 100kDa MWCO concen- 
trator. All purification steps were conducted on ice or at 4°C. Typical purifications 
yielded ~1 mg of purified protein per litre of transduced cells. 

Protocols to express selenomethionine-labelled protein in HEK cells were 
adapted from literature*®. Six to 8h after transduction, cells were pelleted and 
resuspended in DMEM (Life Technologies) supplemented with 10% FBS and 
lacking L-methionine. After shaking methionine-depleted cells for 6h at 37°C, 
60 mg of L-selenomethionine was added per litre of cells. Thirty-six to 48 h after 
transduction, cells were harvested and protein was purified using the same protocol 
as described above, except for the addition of 4mM L-methionine to all purification 
buffers, excluding the final gel filtration buffer. This procedure yielded ~0.4 mg of 
selenomethionine-labelled protein per litre of transduced cells. 

Crystallization and structure determination. Initial high-throughput vapour 
diffusion crystallization screens showed that purified TRPV6cryst crystallizes in 
numerous conditions containing low molecular mass polyethylene glycols (PEG 
300, PEG 350 monomethy] ether (MME), PEG 400 or PEG 550 MME). The best 
crystals were grown using a reservoir solution consisting of 20-24% PEG 350 


MME, 100 mM NaCl and 100 mM Tris-HCl pH 8.0-8.5. To increase crystal size, 
50mM ammonium formate was added to the protein immediately before crystalli- 
zation. Two microlitres of protein were mixed with 1.0-1.2 1 of reservoir solution, 
and incubated at 20°C in hanging-drop vapour diffusion trays. Crystals grew as 
thin plates and reached full size (~400,1m x ~120}1m x ~20j1m) within 2 weeks. 
Crystals were cryoprotected by incubating for a short time in a solution containing 
33-36% PEG 350 MME, 100mM NaCl, 100 mM Tris-HCl pH 8.2, 0.5mM DDM 
and 50mM ammonium formate, and flash frozen in liquid nitrogen. To obtain 
crystals with Ca?*, Ba?* or Gd?*, protein was incubated with 10 mM CaCl, 10mM 
BaCl, or 1 mM GdCl,, respectively, for at least 1h at 4°C before crystallization. 
Crystals of selenomethionine-labelled protein were grown and cryoprotected using 
the same procedure as crystals of native protein. 

Diffraction data collected at APS (beamlines 24-ID-C/E), NSLS (beamlines X25 
or X29) or ALS (beamlines 5.0.1 or 5.0.2) were processed using XDS** or HKL2000 
(ref. 37). The initial structural solution was obtained by molecular replacement 
using Phaser* and the structure of mouse TRPV6 ankyrin domain (PDB accession 
number 2RFA)*” asa search probe and the rest of the molecule was iteratively built 
using rat TRPV1 structure (PDB accession number 3J5P)'“ as a guide. The model 
encompasses most of the polypeptide (residues 27-637), excluding parts of the 
$2-S3 linker (residues 409-416) and S4—S5 linker (residues 471-479), which were 
not clearly visible in the electron density map. The model was refined by alternating 
cycles of building in COOT“ and automatic refinement in Phenix"! or Refmac’?. 
Correct sequence registry was aided by anomalous difference Fourier maps calcu- 
lated from crystals grown in the presence of 10 mM Ca’* to highlight sulfur atoms 
of cysteines and methionines, and from crystals labelled with selenomethionine to 
highlight selenium atoms (Extended Data Fig. 3). To confirm sequence registry in 
the C-terminal region, where native methionines are absent, selenomethionine- 
labelled crystals were produced for protein containing a methionine substitution 
at L630 (L630M). The anomalous difference Fourier maps were calculated from 
X-ray diffraction data collected at 1.75 A for Ca?* and Ba?*, 1.56 A for Gd*+ and 
0.979 A for selenium. All structural figures were prepared in PyMol”. Surface 
representation of the ion permeation pathway was generated using the PyMol 
plugin version of Caver“. The pore radius was calculated using HOLE®. 

Fura-2 AM measurements. Wild-type rat TRPV6 or TRPV6crysi fused to 
C-terminal strep tag was expressed in HEK cells as described above. Forty-eight 
to 72h after transduction, cells were harvested by centrifugation at 600g for 5 min. 
The cells were resuspended in pre-warmed modified HBS (118 mM NaCl, 4.8 mM 
KCl, 1mM MgCh, 5mM p-glucose, 10mM HEPES pH 7.4) containing 5g ml“! 
of Fura-2 AM (Life Technologies) and incubated at 37°C for 45 min. The loaded 
cells were then centrifuged for 5 min at 600g, and resuspended in pre-warmed, 
modified HBS and incubated again at 37°C for 20-30 min in the dark. The cells 
were subsequently pelleted and washed twice, then resuspended in modified HBS 
for experiments. The cells were kept on ice in the dark for maximum of ~2h 
before fluorescence measurements, which were conducted using a QuantaMaster 
40 spectrofluorometer (Photon Technology International) at ~25°C in a quartz 
cuvette under constant stirring. Intracellular Ca?* was measured by taking the 
ratio of two excitation wavelengths (340 and 380 nm) at one emission wavelength 
(510nm). The excitation wavelength was switched at 1-s intervals. 

Isothermal titration calorimetry experiments. To study the energetics of Gd?* 
binding, we performed ITC experiments. For these, we used a MicroCal Auto- 
iTC200 (Malvern Instruments) instrument at the Columbia University ITC 
Facility. Wild-type TRPV6 protein was purified in buffer containing 20 mM Tris, 
150mM NaCl, 1mM DDM and 1 mM BME (buffer A), and the same buffer A was 
also used to dissolve the desired concentrations of Gd** to avoid buffer mismatch. 
The experiments were performed at 25°C using 2-1 volume injections for the 
titration and 700 rpm stirring speed for mixing the reactants. The experiments 
were performed by titrating 700\1M Gd?" (by robotically controlled syringe) to 
6.38-1.M TRPV6 (in cell) at 3-min intervals. The control experiments were per- 
formed to calculate the heat of dilution for each injection by injecting the same 
volumes of Gd*+ into buffer A. The data were analysed using a specialized pro- 
gram in Origin (MicroCal ITC). 

Cysteine crosslinking experiments. For SDS-PAGE and FSEC analysis, cysteine 
substitutions were introduced into the TRPV6,ryst background with five exposed 
cysteines mutated to alanine or serine (C14S, C20S, C70A, C610A and C618A), 
and the surface mutation 162Y was reverted to the native isoleucine. Cysteine- 
substituted mutants with C-terminal eGFP and streptavidin affinity tag were 
expressed in HEK cells in the same way as protein for crystallization and purified 
with a modified protocol. Crude cell pellets were resuspended in buffer containing 
150mM NaCl, 20mM Tris-HCl pH 8.0, 1mM BME, 20mM DDM, 0.8.M apro- 
tinin, 2;g ml leupeptin, 2mM pepstatin A, 1 mM PMSF and stirred for 1-3h. 
After insoluble material was removed by ultracentrifugation, streptavidin-linked 
resin was added to the supernatant and rotated for 4-16h. Further steps were 
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performed in an identical manner to protein purification for crystallization as 
described above, with the exceptions that the final gel filtration buffer lacked BME, 
and TCEP was not added to purified protein. Within 24h of purification, the pro- 
tein samples were run on a 420% SDS-PAGE and visualized by Coomassie blue 
staining. A small portion of protein was subjected to FSEC analysis*™. 
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Extended Data Figure 1 | Functional characterization of wild-type 

rat TRPV6 and TRPV6cryst- a b, d, e, g, h, Representative ratiometric 
fluorescence measurements for HEK cells expressing wild-type rat TRPV6 
(a, d, g) or TRPV6qryst (b, e, h). Arrows indicate the time at which the 
corresponding ion was added. After resuspending the cells in nominally 
calcium-free buffer, addition of Ca** (a, b) or Ba* (d, e) resulted in 
robust concentration-dependent increase in Fura-2 signal for both 
wild-type rat TRPV6 and TRPV6qryst. In contrast, pre-incubation of 

cells with increasing concentrations of Gd** resulted in concentration- 
dependent reduction in Fura-2 signal for both wild type (g) and 
TRPV6cryst (h), consistent with Gd** inhibition of wild-type TRPV6 
demonstrated previously using Ca”* uptake measurements”. c-f, 
Dose-response curves for Ca** (c) and Ba?* (f) permeation calculated for 
wild type (blue) and TRPV6,,ys_ (red) (n= 3 for all measurements). The 
changes in the fluorescence intensity ratio at 340 and 380 nm (F340/F3g0) 
were normalized to their approximated maximal values at saturating 
concentrations of Ca** or Ba**, respectively. The apparent values of 


Time (s) 


[Gd] (uM) 


half-maximum effective concentration (ECsq) for TRP V6cryst 

(1.70 £0.26 mM for Ca?* and 1.27 + 0.67 mM for Ba?*) are similar 

to wild type (1.47 + 0.80 mM for Ca?* and 1.91 +£0.74mM for Ba**). 

i, Dose-response curves for Gd** inhibition calculated for wild type 
(blue) and TRPV6cryst (red) (n = 3 for all measurements). The changes 
F349/F3g9 evoked by addition of 2mM Ca** after pre-incubation with 
various concentrations of Gd** were normalized to the maximal change in 
F349/F3go after addition of 2mM Ca?" in the absence of Gd**. The apparent 
values of half-maximum inhibitory concentration (ICs9) for wild type 
(3.87 = 0.83 1M) are comparable to TRPV6cryst (2.57 + 0.28 j1M). Overall, 
the mutations introduced to crystallize TRPV6 did not significantly 

alter its cation permeation and inhibition properties. The absence of 
time-dependent decay of the Fura-2 AM signal in the case of TRPV6cyst 

is presumably due to its C-terminal truncation, which eliminated a 
calmodulin-binding site involved in Ca?+-dependent inactivation of 
TRPV6 (ref. 46). Error bars, s.e.m. 
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Extended Data Figure 2 | Electron density. a, Stereo view of 2F, — F. subunits are shown to clarify the position of the central pore axis, and the 
electron density map (blue mesh, 45-3.25 A, 1.00) superimposed onto a bound Ca?* ion is shown as a green sphere. In f, inset shows expanded 
ribbon model for the entire TRPV6.,ys: monomer. b-g, Close-up views view of the boxed region, demonstrating electron density for connectivity 
of the 2F, — F. map for various portions of TRPV6.:y., model, with in the S6-TRP helix linker that is distinct from other TRP channel 
side chains shown in stick representation. In e, two diagonally opposed structures!*!°!7, 
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Domain 


$1-S2 loop 


C329 


Extended Data Figure 3 | Anomalous difference Fourier maps for 
sulfur and selenium. a-c, Fragments of the TRPV6.:y.¢ model (yellow 
ribbon) superimposed onto anomalous difference Fourier maps from 
X-ray diffraction data collected at 1.75 A from crystals grown in 10 mM 
Ca? (cyan mesh, 38-4.59 A, 3.00) and at 0.979 A from selenomethionine- 
labelled crystals (pink mesh, 30-5.00 A, 3.20) of TRPV6cryst- Anomalous 
signal collected from a selenomethionine-labelled crystal of TRPV6cryst 


Ankyrin Repeat 


with L630M substitution (a, green mesh, 30-7.20 A, 3.20) was used to 
aid registry in the C-terminal 33 strand. Domains are labelled in blue. 
Cysteine and methionine residues are shown as sticks and labelled. 
Sulfur anomalous difference peaks were observed for all cysteines in the 
TRPV6cryst model. Selenium anomalous difference peaks were observed 
for all methionines in the model, except for M480 and M484 in S5, 
presumably because of flexibility. 


© 2016 Macmillan Publishers Limited. All rights reserved 


Extended Data Figure 4 | Comparison of TRPV6crys¢ and TRPV1. 

a, Bottom-up view of TRPV6cryst (blue) and TRPV1 (salmon) tetramers, 
with ankyrin repeat domain and linker domain helices shown as 
cylinders. When $1-S4 domains are aligned, as shown, the cytoplasmic 
skirt of TRPV6 is rotated clockwise with respect to the cytoplasmic 
skirt of TRPV1. b, Side view of TRPV6qys¢ (blue) and TRPV1 (salmon) 
monomers with $1-S4 domain based alignment. The ankyrin repeat 
domain of TRPV1 extends slightly further into the cytoplasm than 
TRPV6cryst. ¢, Alignment of TRPV6rys¢ (blue) and TRPV1 (salmon) 
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$1-S2 loop 


N-terminal 


TRP helix 


W697 


C-terminal 
hook 


transmembrane domains. Adjacent $1-S4 and pore domains are shown 
for comparison. Similar to TRPV1, aromatic residues pack against 

each other to immobilize the TRPV6<rys¢ $1-S4 domain core (shown as 
sticks). The absence of curvature in $5 and the long extracellular S1-S2 
loop protruding towards the pore are distinct features of the TRPV6cryst 
transmembrane domain. d, Alignment of the TRPV6.cryst TRP helix, 
C-terminal hook and three stranded 3-sheet with homologous domains in 
the TRPV1. Conserved residues (Extended Data Fig. 7) are shown in stick 
representation. 
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Extended Data Figure 5 | See next page for caption. 
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Extended Data Figure 5 | Cysteine crosslinking at the intracellular skirt 
interface and putative desthiobiotin-binding site at the intracellular 
intersubunit interface. a, The TRPV6qrys tetramer with each subunit 
coloured differently (top) and expanded view of boxed region (bottom), 
with cysteine-substituted residues shown as sticks. Dashed line and 

label show C,-C, distance. b, SDS-PAGE (4-20% gradient gel) analysis 
of purified TRPV6 cysteine-substituted mutants in the presence (left) 

and absence (right) of reducing agent. Cysteines were introduced into 

a background construct (TRPV6cysko), in which exposed cysteines in 
TRPV6cryst were mutated to serine or alanine (C14S, C20S, C70A, C610A 
and C618A) to prevent non-specific aggregation. Positions corresponding 
to monomer and tetramer bands are indicated by filled and open triangles, 
respectively. The appearance of a robust band corresponding to covalently 
crosslinked tetramer in the D34C-R631C double mutant indicates that the 
interacting N-terminal helix (which precedes the $1-S4 domain) and 33 
strand (which follows the TRP helix) are from different protomers. Taken 
together with the S6-TRP helix linker connectivity (Extended Data Fig. 2f) 
that is different from TRPV1/2 (refs 14, 16) and TRPA1 (ref. 17), these 
data suggest a non-swapped arrangement of the pore and $1-S4 domains; 
if the canonical domain-swapped arrangement were true, the interacting 
N-terminal helix and 33 strand would be from the same monomer and 

no crosslinked high molecular mass species would form. However, in the 
absence of interpretable density for the S4—S5 linker, we suggest cautious 
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interpretation of this domain arrangement. c, FSEC analysis of purified 
TRPV6cysko crosslink mutants in the absence of reducing agent. Each 
trace shows a single major peak with elution time corresponding to the 
TRPV6cryst tetramer (black trace). d, e, The putative DTB-binding site is 
composed of a pocket formed by the N-terminal helix and ankyrin repeats 
2-4 of one subunit (blue) and the linker domain of an adjacent subunit 
(green). DTB is shown as ball and stick, with 2F, — F. density shown as 
grey mesh (45-3.25 A, 1.00). In d, residues that contact DTB are shown 
as sticks. In e, the binding pocket is shown in surface representation. 
Interestingly, the DTB-binding site overlaps with the ATP-binding site 
revealed in the ankryin domain crystal structure of TRPV1 (ref. 47), 
which was later demonstrated to be conserved in TRPV3 and TRPV4 
(ref. 48). The presence of DTB close to this location in TRPV6 
corroborates the assertion made in ref. 14 that ligands bound in this region 
modulate activity by perturbing subunit interactions. Further work is 
necessary to establish a functional role, if any, of DTB-like compounds 
on TRPV6 function. f-h, Comparison of the putative DTB-binding site 
in TRPV6cryst (f) and the ATP-binding site in the crystal structure of 
the TRPV1 ankyrin domain (g, PDB accession number 2PNN). DTB 
and ATP are shown in ball and stick. While the ATP-binding site in 
TRPV1 is shifted towards ankyrin repeat finger 1, both binding sites are 
located at intersubunit interfaces, as illustrated when the structures are 
superimposed (h). 
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TRPV6 cyt 


Extended Data Figure 6 | Comparison of the ion channel pore in 
TRPV6cryst With other tetrameric ion channels. a-1, The pore of 
TRPV6cryst (yellow ribbon) was aligned with TRPV1 (a, PDB accession 
number 3J5P), NayAb (b, PDB accession number 3RVY), Slo2.2 (c, PDB 
accession number 5A6E), TRPA1 (d, PDB accession number 3J9P), 

Ky1.2 (e, PDB accession number 2R9R), KcsA (f, PDB accession number 
1BL8), InsP3R1 (g, PDB accession number 3JAV), RyR1 (h, PDB accession 
number 3J8H), NayRh (i, PDB accession number 4DXW), CayAb 

(j, PDB accession number 4MVM), Cay1.1 domains I and II] (k, PDB 
accession number 3JBR) and Cay1.1 domains II and IV (1, PDB accession 
number 3JBR). In each of the alignments, acidic residues located at or 
close to the selectivity filter region are shown as sticks for comparison. 
Notably, structures of Ca”*-permeable channels (a, d, g, h, j-) display 

a high concentration of acidic residues in the outer pore region. In 

a-c, methionine residues close to the $6 bundle crossing are shown as 
sticks. Notably, the methionine at the lower gate points away from the pore 


H 


Fees (I+III) 


in TRPV1 (a), despite high sequence conservation in this region among 
TRPV channels (Extended Data Fig. 7). In Slo2.2 (b) and NayAb (c), 
methionine side chains occlude the lower gate as in TRPV6,:ys,, indicating 
that the closed conformation of the lower gate can be chemically similar 
for Nat-, K*- and Ca**-selective channels. m-o, Comparison of calcium- 
binding sites in TRPV6q,ys (m), the engineered voltage gated Ca?* 
channel CayAb (n) and the putative Ca?* site in Cavl.1 (0, domains I and 
III are shown). Residues constituting the selectivity filters are shown in 
stick representation. Ca” ions are shown as green spheres. Sites 1 and 2 
from TRPV6crys¢ overlap with the positions of sites 1 and 3 from CayAb, 
respectively. While it has been proposed that, owing to electrostatic 
repulsion, sites 1, 2 and 3 cannot be simultaneously occupied in CayAb, 
distances between Ca” -binding sites in TRPV6qryst are sufficiently large 
such that they can be simultaneously occupied. The putative Ca** site in 
Cay1.1 is near the equivalent location of site 2 in CayAb. 


v 
Cc 
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rTRPV6 (710) QGIINRGLEDGEGWEYQT 

rTRPVS (703) LGHLNLGQDLGEGDGEEI 
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Extended Data Figure 7 | Sequence alignment of rat TRPV subtypes. mutations and the C-terminal truncation point in TRPV6q.ys, respectively 
Secondary structure elements are depicted above the sequence as cylinders (see Methods). The ¥ symbol marks the N-linked glycosylation site in the 
(a-helices), arrows (8-strands) and lines (loops). Dashed lines show extracellular loop connecting $1 and S2 conserved in TRPV6 (and TRPV5) 
residues in the TRPV6.:y.¢ construct not included in the TRPV6cryst channels. The thick red line marks the location of the selectivity filter. 


structural model. Red boxes and a red arrow highlight substitution 
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Extended Data Figure 8 | Isothermal titration calorimetry analysis of 
TRPV6 interaction with Gd** and anomalous peak amplitudes. a, Gd?* 
in the syringe (700 1M) was titrated into TRPV6 (6.38 1M) loaded into the 
cell. Measurements were performed at 25°C. Top, the raw data for nineteen 
2-1] injections of Gd**. The area of each injection peak is equal to the total 
heat released from that injection. Bottom, the integrated heat per injection 
versus molar ratio. Binding of Gd** to TRPV6 was analysed using models 
with one and two types of binding site. A model with one type of binding 
site was not sufficient to explain the binding isotherm (blue line). In 
contrast, analyses of the binding isotherm using the model with two 

types of binding site, according to equation Q/ = VoMror((n1 AH Ki [X]/ 
(1+ Ky[X])) + (2 AH>K2[X]/(1 + K2[X]))), where Q;* is total heat after 
the ith injection, Vo is the volume of calorimetric cell, Miot is the bulk 
concentration of protein, [X] is the free concentration of Gd?", m; and nz 


are the numbers of type 1 and 2 sites, K, and K, are the observed 
equilibrium constants for each type of the sites and AH; and AH) are the 
corresponding enthalpy changes, satisfactorily described the data (red 
line), and the corresponding values of thermodynamic parameters are 
given in b. The values of AG and TAS were calculated using the following 
relationships: AG = —RT InK and AG= AH — TAS. b, Table showing 
the parameters of experimental data fitting to the model with two types of 
Gd**-binding site. The straightforward interpretation of the ITC results 
is that the ITC type 1 (n= 1) and type 2 (n~4) sites represent the main 
(site 1) and recruitment sites identified crystallographically (Fig. 4e, f). 
Correspondingly, the affinity to Gd** for recruitment sites is ~10 times 
lower than for site 1. c, Table showing anomalous peak amplitudes in 7 
calculated from data collected for Ca?+ (38-4.59 A), Ba?+ (38-4.59 A) and 
Gd?* (38-4.59 A). No numbers are given if the peaks were not observed. 
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Extended Data Figure 9 | Crystal lattice contact of TRPV6cryst- in stick, and C.-C, distances are labelled in d. The crystal contact is 

a, b, Two views of TRPV6cryst crystal packing in the P42)2 space group. apparently mediated by cation-17 and/or hydrogen bonding interactions 
A single TRPV6.,ys; protomer in the asymmetric unit is shown in blue. between these residues. Crystals in the P42)2 space group did not form 
c, d, Close-up views of boxed region in a. Contacting residues are shown when the native isoleucine was present at position 62. 
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Extended Data Table 1 | Data collection and refinement statistics 


Data Collection 
Beamline 
Space group 
Cell dimensions 


a, b, c, (A) 


a, B, y (°) 
Wavelength (A) 
Resolution (A)* 


Completeness 
(%)* 
Redundancy* 


|I/o|* 
Rhaceo (%)* 
CCi2 


Refinement 
Resolution (A)* 


Completeness 
(%) 
Number of 
reflections 


R, work?! Rrree 
Number of atoms 
Total 


Ligand 
B-factor (A?) 
Protein 
Ligand 
RMS deviations 
Bond length (A) 
Bond angles (°) 
Ramachandran 
Favored (%) 
Allowed (%) 
Disallowed (%) 


Native 


APS-241D-C 
P42,2 


143.81 
143.81 
113.22 


90 90 90 
0.9791 


44.48 - 3.25 
(3.36 - 3.25) 


96.0 
(94.7) 
8.7 
(9.2) 
16.9 
(1.3) 
9.8 
(132.6) 
99.8 
(85.7) 


44.48 - 3.25 
(3.36 - 3.25) 


96 
(93.8) 
18531 
(1724) 


0.273/0.289 


A747 
16 


120.5 
77.27 


0.003 
0.7 


93.6 
5.7 
0.17 


*Highest resolution shell in parentheses. 
Five per cent of reflections were used for the calculation of Riree. 


2+ 
Ba 


APS-24ID-C 
P42,2 


144.35 
144.35 
113.37 


90 90 90 
1.75 


49.56 - 3.85 
(3.99 - 3.85) 


99.5 
(95.4) 


15.4 
(13.9) 
15.2 
(1.5) 
13.1 
(228.5) 


98.0 
(76.8) 


49.56 - 3.85 
(3.99 - 3.85) 


100 
(99.9) 


21705 
(2187) 


0.291/0.326 


4775 
19 


143.8 
136.75 


0.002 
0.62 


929 
6.93 
0.17 


© 2016 Macmillan Publishers Limited. All rights reserved 


Cc a?" 


APS-24ID-C 
P42,2 


144.35 
144.35 
113.37 


90 90 90 
1.75 


49.56 - 3.65 
(3.78 - 3.65) 


99.9 
(99.8) 
26.5 
(17.2) 
25.0 
(2.4) 
10.6 
(143.1) 
99.5 
(85.7) 


50.00 - 3.65 
(3.78 -3.65) 


100 
(99.9) 


25439 
(2521) 


0.276/0.281 


4735 
18 


135.1 
24.14 


0.003 
0.62 


93.7 
6.13 
0.17 


G a°* 


APS-24ID-C 
P42,2 


144.35 
144.35 
113.37 


90 90 90 
1.7101 


50.00 - 3.80 
(3.936 - 3.80) 
99.5 
(97.3) 
11.4 
(6.4) 
19.4 
(1.6) 

8.9 
(120.7) 
99.9 
(63.7) 


49.56 - 3.80 
(3.94 - 3.80) 


99 
(96.9) 


22443 
(2170) 


0.298/0.321 


4759 
t/ 


144.23 
178.86 


0.002 
0.63 


92.8 
7.03 
0.17 


L630M-SeMet 


APS-241D-E 
P42,2 
143.60 
143.60 
114.44 
90 90 90 
0.9792 
40.00 - 7.20 
(7.46 - 7.20) 
99.9 
(100.0) 
16.3 
(17.5) 
274 
(6.3) 
20.7 
(86.4) 
98.5 
(89.8) 


SeMet 


APS-241D-E 
P42,2 
143.95 
143.95 
113.04 
90 90 90 
0.9792 
40.00 - 5.00 
(5.18 - 5.00) 
100.0 
(100.0) 
13.7 
(13.9) 
21.8 
(4.7) 
19.2 
(93.5) 
98.3 
(92.5) 
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The first gravitational-wave source from the isolated 
evolution of two stars in the 40-100 solar mass range 


Krzysztof Belczynski!, Daniel E. Holz’, Tomasz Bulik! & Richard O’ Shaughnessy? 


The merger of two massive (about 30 solar masses) black holes 
has been detected in gravitational waves'. This discovery validates 
recent predictions?‘ that massive binary black holes would 
constitute the first detection. Previous calculations, however, have 
not sampled the relevant binary-black-hole progenitors—massive, 
low-metallicity binary stars—with sufficient accuracy nor included 
sufficiently realistic physics to enable robust predictions to better 
than several orders of magnitude*!°. Here we report high-precision 
numerical simulations of the formation of binary black holes via the 
evolution of isolated binary stars, providing a framework within 
which to interpret the first gravitational-wave source, GW150914, 
and to predict the properties of subsequent binary-black-hole 
gravitational-wave events. Our models imply that these events 
form in an environment in which the metallicity is less than ten 
per cent of solar metallicity, and involve stars with initial masses 
of 40-100 solar masses that interact through mass transfer and a 
common-envelope phase. These progenitor stars probably formed 
either about 2 billion years or, with a smaller probability, 11 billion 
years after the Big Bang. Most binary black holes form without 
supernova explosions, and their spins are nearly unchanged since 
birth, but do not have to be parallel. The classical field formation of 
binary black holes we propose, with low natal kicks (the velocity of 
the black hole at birth) and restricted common-envelope evolution, 
produces approximately 40 times more binary-black-holes mergers 
than do dynamical formation channels involving globular clusters"; 
our predicted detection rate of these mergers is comparable to that 
from homogeneous evolution channels'”~!>. Our calculations 
predict detections of about 1,000 black-hole mergers per year with 
total masses of 20-80 solar masses once second-generation ground- 
based gravitational-wave observatories reach full sensitivity. 

We study the formation of coalescing black-hole binaries using 
the StarTrack population synthesis code!®!”. This method has been 
updated to account for the formation of massive black-hole systems 
in isolated stellar environments. The new key factors include an obser- 
vationally supported star-formation rate, chemical enrichment across 
cosmic time and a revised initial condition for evolution of binary stars. 
Hitherto, simulations have been unable to achieve the desired predic- 
tive power because of the limitations on the input physics (for example, 
limited metallicity range) and numerical accuracy. To ensure the dom- 
inant contribution from intrinsically rare low-metallicity star-forming 
environments are adequately sampled, we use a dense grid of metallic- 
ities (32 metallicities) with high precision (20 million binaries each). 

Although binary population synthesis is dependent on a number of 
uncertain physical factors, there has been recent progress in reducing 
this uncertainty and understanding how it affects predictions. In light 
of this, we consider the following three models to encompass major 
sources of uncertainty (Methods): M1 represents our ‘standard’ classical 
formation model for double compact objects composed of two black 
holes (BH-BH), two neutron stars (NS-NS), or one of each (BH-NS); 


M2 is our ‘optimistic’ model, in which Hertzsprung-gap stars may ini- 
tiate and survive common-envelope evolution, leading to many more 
binaries being formed; and M3 is our ‘pessimistic model, in which 
black holes receive large natal kicks, which disrupts and thereby reduces 
the number of BH-BH progenitor binaries. 

For each generated double compact object merger, with its intrin- 
sic component masses and the redshift of the merger, we estimate 
the probability that such a merger would have been detectable in the 
first observing run (O1) of the Laser Interferometer Gravitational- 
Wave Observatory (LIGO) advanced detectors. We adopt a self- 
consistent model of evolution of stellar populations in the Universe*, 
and we take the representative noise curve for O1 (https://dcc.ligo. 
org/LIGO-G1501223/public) and assume 16 days of coincident 
science-quality observational time’. 

In Fig. 1 we show the formation and evolution of a typical binary 
system that result in a merger with similar masses and at a similar 
time to GW150914. Stars that form such mergers are very massive 
(40Mo-100Mo; Mo is the mass of the Sun), and at the end of their 
lives they collapse directly to form black holes’*. Because there is no 
associated supernova explosion, there is also no mass ejection. We 
allow 10% of the collapsing stellar mass to be emitted in neutrinos. 
If natal kicks are associated with asymmetric mass ejection (as in 
our standard model), then our prediction is that these massive black 
holes do not receive natal kicks and that their spin directions are the 
same as that of their progenitor collapsing stars. The binary evolu- 
tion removes the hydrogen-rich envelope from both binary compo- 
nents, making both stars compact and luminous Wolf-Rayet stars 
before they collapse to black holes. The first binary interaction is a 
dynamically stable Roche-lobe overflow phase, whereas the second 
interaction consists of a common-envelope phase that produces a 
compact binary. After the common-envelope phase, the progeni- 
tor binary resembles two known high-mass X-ray binaries hosting 
massive black holes: IC10 X-1 and NGC 300 X-1 (ref. 19). A mas- 
sive BH-BH binary (each with a mass of approximately 30Mo) is 
formed in approximately 5 Myr of evolution, with a relatively wide 
orbit (semi-major axis a 50RQ; Ro is the radius of the Sun), leading 
to a long time to coalescence of tmerger © 10 Gyr. The accretion onto 
the first black hole in the common-envelope phase is only modest 
(approximately 1.5Mo), whereas accretion from stellar wind of its 
companion is rather small (less than 0.1Mo). 

To investigate general aspects of the formation history of GW150914, 
we select a population of GW150914-like BH-BH mergers with a total 
redshifted mass of Mtot,z=54Mo-73Mo, and then further restrict our 
sample to binaries that would be detectable in O1. The formation 
channels typical for these massive BH-BH mergers are summarized 
in Extended Data Table 1. 

We find that the most likely progenitor of GW150914 consists of a 
primary star in the mass range 40M 9-100Mo@ and a secondary in the 
mass range 40M o-80Mo. In our standard model, the binary formed 
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Time (Myr) ; a(Ro) @ Figure 1 | Example binary evolution leading 
Zero-sge misint Sequence to a BH-BH merger similar to GW150914. 
0.0000 | MS 96.2M, MS 60.2M, | 2,463 0.15 A massive binary star (96Mo (blue) + 60Mo 
(purple)) is formed in the distant past (2 billion 
Roche-lobe prenew years after Big Bang; z~ 3.2; top row), and after 
5 million years of evolution forms a BH-BH 
system (37Mo + 31Mo; second-last row). For the 
3.5445 | HG 92.2Mo 59.9Mg | 2,140 0.00 ensuing 10.3 billion years, this BH-BH system 
is subject to loss of angular momentum, with 
the orbital separation steadily decreasing, until 
the black holes coalesce at redshift z= 0.09. 
This example binary formed in a low-metallicity 
ides environment (Z = 0.03Z 9). MS, main-sequence 
3.5448 a 42.3M5 84.9Mg | 3,112 0.00 star; HG, Hertzsprung-gap star; CHeB, 
CHeB core-helium-burning star; BH, black hole; 
a, orbital semi-major axis; e, eccentricity. 
3.8354 | He star 39.0Mo 84.7M, | 3,579 0.00 
3.8354 | BH 35.1Mp5 84.7Mg | 3,700 0.03 
Common envelope 
5.0445 | BH 35.1M 5 CHeB_ 82.2M, | 3,780 0.03 
5.0445] BH  36.5Mg - © He star 36.8M, | 43.8 0.00 
5.3483; BH 36.5Mg - © He star 34.2M, | 45.3 0.00 
J Direct collapse 
5.3483 | BH 36.5Mo e BH 30.8M, | 47.8 0.05 
‘y Merger 
10,294 0 0.00 
v 


in a low-metallicity environment (Z < 0.1Z 9; Zo is the metallicity of 
the Sun; see Extended Data Fig. 1) and either in the early Universe 
(2 Gyr after the Big Bang) or very recently (11 Gyr after the Big Bang). 

The distribution of birth times of these massive BH-BH mergers is 
bimodal (Fig. 2 and Extended Data Fig. 2), with a majority of systems 
originating from the distant past (55% of binaries; about 2 Gyr after 
the Big Bang, corresponding to z~3) and a smaller contribution from 
relatively young binaries (25%; formed about 11 Gyr after the Big Bang, 
corresponding to z*0.2). This bimodality arises from two naturally 
competing effects: on the one hand, most low-metallicity star formation 
occurs in the early Universe; on the other hand, in contrast to previous 
work>, significantly more low-metallicity star formation is currently 
expected to occur in the low-redshift Universe”’. Therefore, as is the case 
with binary neutron stars, we anticipate a significant contribution to 
the present-day binary-black-hole merger rate from binary black holes 
formed in low-redshift, low-metallicity star-forming regions. The delay- 
time distribution of BH-BH binaries in our simulations follows a 1/t 
distribution. The birth times therefore naturally pile up at low redshifts 
(z0.1-0.3) and this gives rise to a low-z peak (Extended Data Fig. 2a). 
However, the low-metallicity (Z < 0.1Z 9) star formation responsible for 
the production of massive BH-BH mergers peaks at a redshift of 3 
(Extended Data Fig. 2b). The convolution of these two effects produces 
the bimodal birth-time distribution (Extended Data Fig. 2c). 

These massive GW150914-like mergers consist of black holes with 
comparable masses. The vast majority (99.8%) of mergers are found 
with mass ratios in the range q = 0.7-1.0 (Extended Data Fig. 3), with 
the mass ratio of GW150914 (q=0.82"? 5°) falling near the centre of 


the expected region. The formation of low-mass-ratio objects is sup- 
pressed because low-mass-ratio progenitors tend to merge during the 
first mass-transfer event when the more massive component overrfills 
its Roche lobe*!. However, with decreasing total merger mass, the mass 
ratio extends to lower values. In particular, for the lower mass bin of 
Mot, = 25Mo-37Mo, mass ratios as low as q=0.3 are also found. 

We now use our full sample of double compact object mergers to 
make predictions for the merger-rate density, detection rates and 
merger mass distribution. The results are shown in Fig. 3 and Extended 
Data Table 1, in which we compare them to the measured values 
inferred from O1 LIGO observations. We find an overall detection 
rate that is consistent with the detection of one significant candidate 
(GW150914) during the principal 16-day double coincident period 
(when both LIGO gravitational-wave interferometers are operating 
simultaneously) for our standard model (M1), but that is inconsistent 
for our other two models (optimistic M2 and pessimistic M3; more 
detail below). 

The BH-BH merger rates inferred from the 16 days of O1 LIGO 
observations are in the range 2-400 Gpc~? yr! (ref. 22). For compar- 
ison, we estimate the rate density of binary black holes from our popu- 
lation synthesis dataset. We consider the full population of binary black 
holes within a redshift of z=0.1 (that is, not weighted by their detec- 
tion probability) and calculate their average source-frame merger- 
rate density. We find a value of 218 Gpc~? yr“! for our standard model 
(M1), which is in good agreement with the inferred LIGO rate””. By 
contrast, our optimistic model (M2) predicts too many mergers, with 
a rate density of 1,303 Gpc~? yr7!, and our pessimistic model (M3) 
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Figure 2 | Birth times of GW150914-like progenitors across cosmic time. 
dRget/dt represents the contribution to the detection rate from binaries in a 
given 0.1-Gyr bin of birth time. Half of the binaries that form BH-BH mergers 
detectable in O1 with total redshifted mass in the range Miot.=54Mo-73Mo 
were born within 4.7 Gyr of the Big Bang (corresponding to z> 1.2). The 
birth and merger times of the binary depicted in Fig. 1 are marked in blue; 

this binary follows the most typical evolutionary channel for massive 

BH-BH mergers (BHBH1 in Extended Data Table 1). The merger redshift of 
GW150914 is z= 0.088. The bimodal shape of the distribution originates from 
a combination of the BH-BH delay-time distribution and the low-metallicity 
star-formation history (see Extended Data Fig. 2 for details). 


is at the very bottom end of the allowable range with a predicted rate 
of 6.6 Gpc~? yr~!. In our models, the BH-BH merger-rate density 
increases with redshift (Extended Data Fig. 4). This increase is modest; 
our predicted source-frame BH-BH merger-rate density would double 
if the cut-off redshift was increased from z=0.1 to z=0.6. 

The merger-rate density for the model with an optimistic 
common-envelope phase (M2) is an order of magnitude larger than 
the rate estimate from LIGO. This implies that unevolved massive stars 
(during main sequence and Hertzsprung gap) do not initiate/survive 
the common-envelope phase””’. In our classical BH-BH formation 
scheme, only evolved stars (during core helium burning) with well- 
developed convective envelopes are allowed to initiate and survive the 
common-envelope phase. 

Our predictions for the pessimistic model (M3) imply that large natal 
kicks (with average magnitudes of more than about 400 km s~') are 
unlikely for massive black holes. This model predicts that an event such 
as GW150914 would happen only 1% of the time, with the detection 
of any BH-BH system happening less than 10% of the time (Table 1). 
In principle, this conclusion applies to the formation of only the first 
black hole in the binary, because large natal kicks lead to disruption of 
BH-BH progenitors while the binaries are wide. During the formation 
of the second black hole, the progenitor binaries are on very close orbits 
(Fig. 1) and are not disrupted by natal kicks. In Extended Data Fig. 4 we 
show a sequence of models with intermediate black-hole natal kicks; 
future observations may allow us to discriminate between these models 
and to constrain the natal-kick distribution. Future observations con- 
verging on M1 would indicate no natal kicks nor supernova explosions 
in massive black-hole formation’®. A striking ramification of this is the 
prediction that hot and luminous Wolf-Rayet progenitors of massive 
black holes” should disappear from the sky as a result of direct collapse 
to a black hole (that is, with no supernova explosion). Targeted observa- 
tional campaigns to search for such phenomena are already underway”. 

Figure 3 shows the relative contribution to the overall merger-rate den- 
sity associated with each bin of total redshifted merger mass Miot,z. For 
comparison, Fig. 3 also shows the fiducial sensitivity (see Methods) as 
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Figure 3 | Comparison of merger rates and masses with O1 LIGO 
results. Results are shown for standard (M1; red solid lines), optimistic 
common-envelope phase (M2; pink dash-dotted lines) and pessimistic 
large black-hole kicks (M3; green/black solid/dash-dotted line) models. 

a, Distribution of total redshifted binary mass. The merger-rate density of 
GW150914 (70.5Mo) is indicated by the blue square (with 90% confidence 
interval in mass, and its vertical position arbitrary). The blue solid line 
shows the fiducial estimate of the sensitivity (or upper limits) of the 16-day 
Ol run. A comparison of the shapes of the blue and red lines suggests 

that the most likely detections for M1 are BH-BH mergers with masses in 
the range 25Mo-73Mo. NS-NS mergers (first bin) and BH-NS mergers 
(next five bins) are well below the estimated sensitivity and thus detections 
in O1 are not expected. The rate densities are in the detector rest frame. 

b, Comparison of the LIGO estimate of the BH-BH merger rate with 

our models. The LIGO value of 2-400 Gpc~? yr~! (90% credible range) 
compares well with our standard (M1) and large black-hole natal kicks (M3) 
models. The rate densities are in the source rest frame. An updated version 
of Fig. 3, including additional gravitational-wave detections as they occur, 
can be found at http://www.syntheticuniverse.org/stvsgwo.html. 


a function of mass, assuming equal-mass zero-spin binary black holes. 
Figure 3 demonstrates that the intersection of the strongly mass-dependent 
sensitivity and the intrinsic detectable mass distribution strongly favours 
sources with total redshifted masses of 25M o-73Mo, consistent with 
recent work’ and total redshifted mass of GW150914 (Mioiz=70.5Mo). 
In our simulations, the maximum intrinsic mass of a merging BH-BH 
binary is M,o:= 140M 9. When accounting for cosmological redshift 


Table 1 | Expected detection rate and number of detections 


Number of detections in 


Model Mergertype 1 detection rate (yr~) 16 days of O1 
Ml All 63.18 2.770 
NS-NS 0.052 0.002 
BH-NS 0.231 0.010 
BH-BH 62.90 2.758 
GW150914 11.95 0.524 
M2 All 476.1 20.87 
NS-NS 0.191 0.008 
BH-NS 0.796 0.035 
BH-BH 475.1 20.83 
GW150914 110.0 4.823 
M3 All 1.985 0.087 
NS-NS 0.039 0.002 
BH-NS 0.014 0.001 
BH-BH 1.932 0.085 
GW150914 0.270 0.012 
The first column indicates the model: standard (M1), optimistic common-envelope phase (M2), _ 
and large black-hole kicks (M3). The third column lists the expected detection rate Raet per unit 


double coincident time (both LIGO detectors operating at appropriate sensitivity), for a network 
comparable to O1, for different classes of mergers (indicated in the second column). The fourth 
column shows Rget7, where T= 16 days is the analysis time relevant for the rate estimate for 
GW150914 (ref. 22). Entries for merger type ‘GW150914’ are for the subpopulation of BH-BH 
mergers with total redshifted mass in the range Mtotz=54Mo-73Mo5. 
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(Miotz= (1+ z)Miot) and taking into account the advanced O1 hori- 
zon redshift for this most massive binary (z=0.7), the highest possible 
observed mass within O1 would be approximately 240Mo. 

Spin magnitudes and directions of merging black holes are poten- 
tially measurable by LIGO’. The second-born black hole in a BH-BH 
binary does not accrete mass, and its spin at merger is unchanged 
from its spin at birth. The first-born black hole, on the other hand, 
has a chance to accrete material from the stellar wind of the unevolved 
companion or during common-envelope evolution. However, because 
this is limited either by the very low efficiency of accretion from 
stellar winds or by inefficient accretion during common-envelope 
evolution”®°”’, the total accreted mass onto the first-born black hole 
is expected to be rather small (about 1Mo-2Mo). This is insufficient 
to significantly increase the spin, and thus the spin magnitude of the 
first-born black hole at merger is within about 10% of its birth spin. 

In our modelling, we assume that stars that are born in a binary have 
their spins aligned with the angular-momentum vector of the binary. 
If massive black holes do not receive natal kicks (for example, in our 
standard model M1), then our prediction is that black-hole spins are 
aligned during the final massive BH-BH merger. We note that our 
standard model includes natal kicks and mass loss for low-mass black 
holes (less than about 10M), and therefore BH-BH binaries with one 
or two low-mass black holes may show misalignment. Alternatively, 
binaries could be born with misalignment and retain it, misalignment 
could be caused by the third body or by interaction between the radia- 
tive envelope and the convective core’®, or misalignment could result 
from a large natal kick on the second-born black hole. Several binaries 
are reported with misaligned spins”. Therefore, spin alignment of 
massive merging black holes suggests isolated field evolution, while 
misaligned spins do not elucidate formation processes. 

As shown in Fig. 1, we find that the formation of massive BH-BH 
mergers is a natural consequence of isolated binary evolution. Our 
standard model (M1) of BH-BH mergers fully accounts for the observed 
merger-rate density and merger mass (Fig. 3), and for the mass ratio of two 
merging black holes (Extended Data Fig. 3) inferred from GW150914. 

Our standard formation mechanism (M1) produces significantly more 
binary black holes than do alternative, dynamical channels associated 
with globular clusters. A recent study"! suggests globular clusters could 
produce a typical merger rate of 5 Gpc~? yr~'; our standard model (M1) 
BH-BH merger-rate density is about 40 times larger: 218 Gpc~? yr“. 

However, one non-classical isolated binary evolution channel involving 
rapidly rotating stars (homogeneous evolution) in very close binaries 
may also fully account for the formation of GW150914 (refs 12-15). 
In particular, typical rates of 1.8 detections in 16 days of O1 observations 
are found!’, which is comparable to our prediction of 2.8 (Table 1). Only 
very massive BH-BH mergers with total intrinsic masses of more than 
about 50Mog are formed in this model!*"3, whereas our model predicts 
mergers with masses in a broader range, down to greater than about 
10Mo. Future LIGO observations of BH-BH mergers may allow us to 
discriminate between these two very different mass distributions/models. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Our Monte Carlo evolutionary modelling is performed with the StarTrack binary 
population synthesis code'®. In particular, we incorporate a calibrated treatment 
of tidal interactions in close binaries!’, a physical measure of the common enve- 
lope (CE) binding energy***!, and a rapid-explosion supernova model that repro- 
duces the observed mass gap between neutron stars and black holes (BHs)'*"?. Our 
updated mass spectrum of BHs shows a strong dependence on the metallicity of 
the progenitor stars (Extended Data Fig. 5). In galaxies with metallicities similar to 
the Milky Way (Z= Za = 0.02), BHs that formed out of single massive stars (initial 
mass Mzams= 150Mg) reach a maximum mass of Mgy = 15Mo, whereas, for very 
low metallicity (Z=0.0001 =0.005Zg), the maximum mass becomes Mpy = 94Mo. 
The above input physics represents our standard model (M1), which is repre- 
sentative of our classical formation scheme for double compact objects (BH-BH, 
BH-NS and NS-NS). 

We have adopted specific values for a number of evolutionary parame- 
ters. Single stars are evolved with calibrated formulae based on detailed evolu- 
tionary calculations**. Massive star winds are adopted from detailed studies 
of radiation-driven mass loss*4. For the Luminous Blue Variable phase, a high 
rate of mass loss is adopted (1.5 x 10-*Mg yr7!). Binary interactions and, in 
particular, the stability of Roche-lobe overflow (RLOF) is judged on the basis 
of binary parameters: mass ratio, evolutionary stage of donor, response to mass 
loss, and behaviour of the orbital separation in response to mass transfer. The 
orbital separation is additionally affected by gravitational radiation, magnetic 
braking, and loss of angular momentum associated with systemic mass loss. 
During stable RLOE, we assume that half of the mass is accreted onto the com- 
panion, while the other half (1 —f,=0.5) is lost with specific angular momentum 
(dJ/dt = jross Vorb/ (Maon + Macc)](1 — fa)dMrroe/dt with scaling factor jjoss = 1.0 
where f, is the fraction of the mass accreted, Job is the orbital angular momentum, 
Maon is the donor mass, M,,¢ is the accretor mass, and dMpyo¢/dt is the mass trans- 
fer rate; ref. 35). The CE is treated by considering the energy balance with fully 
effective conversion of orbital energy into envelope ejection (conversion efficiency 
a= 1.0), whereas the envelope binding energy for massive stars is calibrated by 
a parameter , which depends on star radius, mass and metallicity. For massive 
stars, \0.1 is adopted*!. During CE evolution, compact objects accrete at 10% 
of the Bondi—Hoyle rate as estimated by recent hydrodynamical simulations”°”’. 
Our CE evolution is instantaneous, so the time at the beginning and end of the 
CE phase is exactly the same (see Fig. 1); the time duration of the CE phase has 
no impact on our results. 

We consider two extra variations of the input physics of binary evolution. In one 
model (M2), we test highly uncertain CE physics* and we allow for Hertzsprung- 
gap stars to initiate and survive CE evolution. This is an optimistic assumption, 
because these stars may not allow for CE evolution”, nor survive as a binary ifa 
CE forms’. For comparison, in our standard model, we allow only evolved stars 
with a deep convective envelope (core-helium-burning stars) to survive a CE phase. 

In the opposite extreme, we use a model (M3) in which BHs receive large natal 
kicks. In particular, each BH gets a natal kick with its components drawn from a 
Maxwellian distribution with a one-dimensional root-mean-square = 265kms~}, 
independent of BH mass. Such large natal kicks are measured for Galactic pul- 
sars*’, This is a pessimistic assumption, because large natal kicks tend to disrupt 
BH-BH progenitor binaries. This assumption is not yet excluded on the basis of 
electromagnetic observations‘. By contrast, in our standard model, BH natal kicks 
decrease with BH mass. In particular, for massive BHs that form through direct 
collapse of an entire star to a BH with no supernova explosion (Mgy 2 10Mo for 
Z=Zo; Mpu2 15MoQ for Z=0.1Zp and Mgy 2 15Mo-30Mo for Z=0.01Z 9), we 
assume no natal kicks!®. We also calculated a series of models with intermediate 
BH kicks (see Extended Data Fig. 4): 7=200km s-! (model M4), c=130kms"! 
(model M5) and c=70kms~! (model M6). 

For each evolutionary model we compute 2 x 10’ massive binaries for each 
point on a grid of 32 sub-models covering a wide range of metallicities: Z= 0.0001, 
0.0002, 0.0003, 0.0004, 0.0005, 0.0006, 0.0007, 0.0008, 0.0009, 0.001, 0.0015, 0.002, 
0.0025, 0.003, 0.0035, 0.004, 0.0045, 0.005, 0.0055, 0.006, 0.0065, 0.007, 0.0075, 
0.008, 0.0085, 0.009, 0.0095, 0.01, 0.015, 0.02, 0.025 and 0.03. We assume that stellar 
evolution at even lower metallicities proceeds in the same way as the evolution at 
Z=0.005Zo. However, stars with very low metal content (for example, Population 
III) may evolve differently to metal-rich stars*®. 

Each sub-model is computed with initial distributions of orbital periods P 
(proportional to [log(P)]~°°), eccentricities e (proportional to e-°**) and mass 
ratios q (proportional to q°) appropriate for massive stars*”. We adopt an initial 
mass function that is close to flat for low-mass stars (proportional to M~!3 for 
0.08Mo <M <0.5Mo and to M *? for 0.5Mo <M < 1.0Mo) and that is top-heavy 
for massive stars (proportional to M~** for 1.0Mo <M < 150Mo), as guided by 
recent observations”’. The adopted initial mass function generates higher BH-BH 


merger-rate densities as compared with the steeper initial mass function (propor- 
tional to M~*” for 1.0Mo <M < 150M) adopted in previous studies**!, because 
there are more BH-BH merger progenitors in our simulations”. 

A moderate binary fraction (f,;= 0.5) is adopted for stars with masses 
Mzams < 10Mo, whereas we assume that all more massive stars are formed in 
binaries (f,;= 1.0), as indicated by recent empirical estimates*”. 

We adopt an extinction-corrected cosmic star-formation rate (SFR) based on 
numerous multi-wavelength observations: 


(qd ‘os z)27 
+[(1+z)/2.9P: 


This SFR declines rapidly at high redshifts (z > 2). This may be contrasted with 
some SFR models used previously*’, which generated a greater number of stars 
at high redshifts. This revision will thus reduce the BH-BH merger-rate densities 
at all redshifts. Even though the formation of BH-BH binaries takes a very short 
time (about 5 Myr), the time to coalescence of two BHs may be very large (Fig. 1 
and Extended Data Fig. 2). 

In our treatment of chemical enrichment of the Universe, we follow the mean 
metallicity increase with cosmic time (since Big Bang until present). The mean 
metallicity as a function of redshift is: 


SFR (2)=0.015- 


—@Mo Mpc”? yr! (1) 


J 


y(1—R) ie 97.8 x 10!°SFR(z’) 


loglZmean(2)]=0-5 4 lg Ps HoE(2)(1 +2") 


with a return fraction R= 0.27 (mass fraction of each generation of stars that is put 
back into the interstellar medium), a net metal yield y= 0.019 (mass of new metal 
created and ejected into the interstellar medium by each generation of stars 
per unit mass locked in stars), a baryon density p,=2.77 x 10! 2, hyMo Mpe3 
with (2, = 0.045 and ho =0.7, a SFR given by Equation (1), and E(z)= 
(Ou +zP + (1 +z)? +.M% with 2) =0.7, 2 =0.3, A%=0 and Hy 
70.0kms~! Mpc™!. The shape of the mean-metallicity dependence on redshift 
follows recent estimates“*, although the level was increased by 0.5 dex to better fit 
observational data‘®. At each redshift, we assume a log-normal distribution of 
metallicity around the mean, with a standard deviation of o=0.5 dex (ref. 47). Our 
prescription (Extended Data Fig. 6) produces more low-metallicity stars than pre- 
viously*!, Because BH-BH formation is enhanced at low-metallicity, our new 
approach increases the predicted rate densities of BH-BH mergers. 

Here we discuss caveats of evolutionary calculations. First, we consider only 
isolated binary evolution, and thus our approach is applicable to field stars in 
low-density environments. It is possible that dynamical interactions enhance 
BH-BH merger formation in dense globular clusters"', offering a completely 
independent channel. 

Second, our predictions are based on a ‘classical’ theory of stellar and binary 
evolution for the modelling of massive stars that we have compiled, developed 
and calibrated over the last 15 years. We do not consider exotic channels for the 
formation of BH-BH mergers, such as the one from rapidly rotating stars in con- 
tact binaries"*. 

Third, our modelling includes only three evolutionary models: a standard 
model consisting of our best estimates for reasonable parameters (M1), as well as 
optimistic (M2) and pessimistic (M3) alternative models. The optimistic model 
consists of only one change from the standard model: we allow all stars beyond the 
main sequence to survive the CE phase. Alternatively, the pessimistic model also 
consists of only one change: larger BH natal kicks. We have not investigated other 
possible deviations from the standard model (for example, different assumptions 
of mass and angular-momentum loss during stable mass-transfer evolution) nor 
have we checked inter-parameter degeneracies (for example, models with large BH 
kicks and an optimistic CE phase). Precursor versions of these computationally 
demanding studies have already been performed”, albeit with low statistics and 
limited scope; these calculations indicate that our three models probably cover the 
range of interesting effects. 

Fourth, our observations are severely statistically limited. We are attempting to 
draw inferences about our models on the basis of a single detection (GW150914). 

In was argued” that the formation of GW150914 in isolated binary evolution 
requires a metallicity lower than 0.5Z. This argument was based on single stellar 
models”’; stars in close binaries are subject to significant mass loss during RLOF/ 
CE, and they form BHs with lower mass than BHs formed by single stars. Thus, 
in binaries, the metallicity threshold for massive BH formation is lower than in 
single stellar evolution. For example, formation of a single 30Mo BH requires 
Z<0.25Zo (Extended Data Fig. 5, whereas formation of two such BHs in a binary 
requires Z < 0.10Z (Extended Data Fig. 1). The value of this threshold depends 
on assumptions for the model of stellar evolution, winds and BH formation 
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processes. The physical models we have adopted yield a threshold of Z<0.10Zo, 
the same as that obtained with MESA (http://mesa.sourceforge.net/) for homo- 
geneous stellar evolution’. Our model was calibrated using known masses of 
BHs and, in particular, we do not exceed 15Mg for Zo (the highest-mass stellar 
BH known in our Galaxy). By contrast, single stellar models used to derive the 
high metallicity threshold produce 25Mg for Zg (ref. 51). The highest threshold 
obtained with binary evolution was reported at the level of 0.5Zg (ref. 14). Such 
a high value of the metallicity threshold for the progenitor of GW150914 implies 
that stars at approximately solar metallicity (Z = 0.014) produce BHs as massive as 
40Mo (ref. 14). This is neither supported nor excluded by available electromagnetic 
BH mass measurements (https://stellarcollapse.org/bhmasses). 

In the following, we present calculation of the gravitational radiation signal. 
The output of StarTrack is a binary merger at a given time. We then calculate the 
gravitational waveform associated with this merger, and determine whether this 
binary would have been observable by LIGO in the O1 configuration*”. 

We model the full inspiral-merger-ringdown waveform of the binaries using 
the IMRPhenomD gravitational waveform template family*”*’. This is a simple 
and fast waveform family that neglects the effects of spin (which are not relevant 
for GW150914). We consider a detection to be given by a threshold of SNR>8 
in a single detector, and we use the fiducial O1 noise curve (https://dcc.ligo.org/ 
LIGO-G1501223/public). We calculate the face-on, overhead SNR for each binary 
directly from equation (2) of ref. 3. We then calculate the luminosity distance at 
which this binary would be detected with SNR=8. As the distance to the binary 
changes, the observer-frame (redshifted) mass also changes, and therefore cal- 
culating the horizon redshift requires an iterative process. Once this has been 
calculated, we then determine the predicted detection rates using equation (9) of 
ref. 3; the effects of the antenna power pattern are incorporated in the paet term 
in this equation. 

An estimate of fiducial advanced LIGO sensitivity during the 16-day GW150914 
analysis is shown in Fig. 3. We estimate the sensitivity to coalescing compact bina- 
ries using a reference O1 noise curve. We assume that both detectors operate with 
the fiducial O1 noise curve, which is the same sensitivity we adopted to calculate 
compact binary detection rates. For comparison, this model agrees reasonably 
well with the ‘early-high’ sensitivity model®*. Our expression is a 50th percentile 
upper limit, assuming no detections. The critical application of this expression is 
not related to its overall normalization; we are instead interested in its shape, which 
characterizes the strongly mass-dependent selection biases of LIGO searches. 

Using these inputs, our fiducial estimate of the advanced LIGO sensitivity 
during the first 16 days of O1 for a specific mass bin AM is: 


0.7 


Rp,AM,UL = => 
AMT 


where T'= 16 days corresponds to the analysis of GW150914!, the volume: 


1 dz dV 
Ve law or fh. Gv. 
ane ae! ie zat (wv M) 


is the sensitive volume averaged over mass bin AM;j, and pget(w, M) is the 
orientation-averaged detection probability**. The function paet(w, M) depends 
on the coalescing binary redshifted mass M through the maximum luminosity 
distance (‘horizon distance’) at which a source could produce a response of SNR 
>8 in a single detector through a projection parameter w, which is maximum 
(w = 1) fora face-on, overhead source, and minimum (w = 0) for sky locations and 
orientations where the LIGO detector has no response to the source. To calculate 
this distance, we adopt the IMRPhenomD gravitational waveforms**”? that we 
also used to estimate compact binary detection rates. Extended Data Fig. 7 shows 
our estimated horizon redshift as a function of the total redshifted binary merger 
mass for equal-mass mergers. 
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Code availability. We have opted not to release the population synthesis code 
StarTrack used to generate binary populations for this study. 
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Extended Data Figure 1 | Maximum total mass of BH-BH mergers as 

a function of metallicity. Binary stars at metallicities Z<0.1Z can form 
BH-BH mergers that are more massive than Mo = 64.8Mo. This suggests 
that GW150914 was formed in a low-metallicity environment, assuming 
it is a product of classical isolated binary evolution. The total binary- 
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maximum BH-BH mass is not a simple sum of maximum BH masses 
resulting from single stellar evolution; this is a result of mass loss during 
the RLOF and CE evolution phases in the formation of massive BH-BH 
mergers (Fig. 1). 
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Extended Data Figure 2 | Emergence of a bimodal birth-time binaries are formed by only low-metallicity stars (Z < 0.10Z9). The 
distribution. a, BH binaries follow an intrinsic power-law delay-time fraction of all stars that form at such low Z (Fz) decreases with cosmic 
distribution (proportional to t~!). The birth time (tpirth = tmerger — tdelay) time, making low-Z star formation (in units of Mo Mpc’? yr_1) peak 
is inverted compared to the delay-time distribution (blue line), with the at early cosmic time. sfr, star-formation rate. c, The final birth-time 
spread caused by allowing the merger time (tmerger) to fall anywhere within distribution for massive BH-BH mergers is a convolution of the intrinsic 
the O1 LIGO horizon: z= 0-0.7; this generates a peak corresponding to birth times and the low-metallicity star-formation rate. 


BH-BH progenitors born late with short delay times. b, Massive BH-BH 
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detector-frame mass ratio is shown. BH-BH binaries prefer mass ratios of 
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Extended Data Figure 4 | Source-frame merger-rate density for BH-BH 
binaries as a function of redshift. The red line shows the results from our 
standard model (M1); in this model, massive BHs do not get natal kicks. 

A sequence of models with increasing BH natal kicks (models M6, M5, 
M4, M3) is shown. The rate density decreases with increasing natal kick 
strength described by a Maxwellian distribution with a one-dimensional 
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redshift 


root mean square deviation of o. The local merger-rate density (z < 0.1) 
changes from 218 Gpe ? yr~! (M1) to 63 Gpe? yr~! (M6), 25 Gpe + yr! 
(M5), 11 Gpe~3 yr! (M4) and 6.6 Gpc yr! (M3). The LIGO estimate 
(2-400 Gpc~? yr~!) encompasses all of these models. We mark the O1 
LIGO detection horizon (z= 0.7; see Extended Data Fig. 7). 
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Extended Data Figure 5 | BH mass as a function of initial star 
mass, for a range of metallicities. These results show calculations for 
single star evolution with no binary interactions. Our updated models 
of BH formation show a general increase of BH mass with initial 
progenitor star mass. There is strong dependence of BH mass on the 


chemical composition of the progenitor. For example, the maximum 

BH mass increases from 10M 9-15Mg for high-metallicity progenitors 
(Z=1.5Z9-1Zq) to 94Mo for low-metallicity progenitors (Z=0.005Zo). 
The formation of a single 30Mo BH requires a metallicity of Z<0.25Zo. 
ZAMS, zero-ago main sequence. 
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Extended Data Figure 6 | Mean-metallicity evolution of the Universe 
with redshift. It is assumed that at each redshift the metallicity 
distribution is log-normal with a standard deviation of o =0.5 dex. The 
blue line denotes the mean-metallicity evolution adopted in previous 
studies. The new relation generates more low-metallicity stars at all 


redshifts. We mark the line above which we can make predictions 
(log(Z/Zo) = —2.3, Zo = 0.02; ref. 55) based on actual evolutionary stellar 
models adopted in our calculations. Below this line we assume that stars 
produce BH-BH mergers in the same way as in the case of our lowest 
available model. 
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Extended Data Figure 7 | Horizon redshift for the first advanced LIGO observational run (O1). Horizon is given as a function of the total redshifted 
binary merger mass (assuming equal-mass mergers). For the highest-mass mergers found in our simulations (Mtot,z= 240Mo), the horizon redshift is 
Zhor = 0.7. For GW150914 (Mtot,z= 70.5Mo), the horizon redshift is Zyor = 0.36. 
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Extended Data Table 1 | Formation channels of massive BH-BH mergers (M1) 


Channel 
BHBHI1 
BHBH2 
BHBH3 
Other 


Evolutionary sequence all [%] 
MT1(2-1) BH1 CE2(14-4;14-7) BH2 | 79.481 
MT1(4-1) BH1 CE2(14-4;14-7) BH2 | 13.461 
MT1(4-4) CE2(4/7-4;7-7) BH1 BH2 | 5.363 

additional combinations | 1.696 


high-z 
38.045 
10.766 
4.852 
0.625 
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mid-z low-z 
18.673 22.763 
1.101 1.594 
0.194 0.317 
0.421 0.649 


The first two columns identify evolutionary sequences leading to the formation of BH-BH mergers with Mtotz=54Mo-73Mo. The third column lists the formation efficiency. The last three columns list 

the formation efficiency of BH-BH progenitors born atz> 1.12, 1.12 <z <0.34, z<0.34. Notation: stable mass transfer (MT), common envelope (CE), BH formation (BH) initiated by either the primary 
star (1) or the secondary star (2). In parentheses we give the evolutionary stage of stars during MT/(pre-;post-)CE: main sequence (1), Hertzsprung gap (2), core helium-burning (4), helium star (7) or 
BH (14), with the primary star listed first. 
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Real-time dynamics of lattice gauge theories with a 
few-qubit quantum computer 


Esteban A. Martinez!*, Christine A. Muschik?**, Philipp Schindler', Daniel Nigg', Alexander Erhard!, Markus Heyl**, 
Philipp Hauke”, Marcello Dalmonte?*, Thomas Monz!, Peter Zoller”? & Rainer Blatt! 


Gauge theories are fundamental to our understanding of 
interactions between the elementary constituents of matter as 
mediated by gauge bosons!?. However, computing the real-time 
dynamics in gauge theories is a notorious challenge for classical 
computational methods. This has recently stimulated theoretical 
effort, using Feynman’s idea of a quantum simulator*”, to devise 
schemes for simulating such theories on engineered quantum- 
mechanical devices, with the difficulty that gauge invariance and 
the associated local conservation laws (Gauss laws) need to be 
implemented®’. Here we report the experimental demonstration 
of a digital quantum simulation of a lattice gauge theory, by 
realizing (1+ 1)-dimensional quantum electrodynamics (the 
Schwinger model*”’) on a few-qubit trapped-ion quantum computer. 
We are interested in the real-time evolution of the Schwinger 
mechanism’"!, describing the instability of the bare vacuum due 
to quantum fluctuations, which manifests itself in the spontaneous 
creation of electron-positron pairs. To make efficient use of our 
quantum resources, we map the original problem to a spin model 
by eliminating the gauge fields’? in favour of exotic long-range 
interactions, which can be directly and efficiently implemented on 
an ion trap architecture’. We explore the Schwinger mechanism of 
particle-antiparticle generation by monitoring the mass production 
and the vacuum persistence amplitude. Moreover, we track the real- 
time evolution of entanglement in the system, which illustrates how 
particle creation and entanglement generation are directly related. 
Our work represents a first step towards quantum simulation of 
high-energy theories using atomic physics experiments—the long- 
term intention is to extend this approach to real-time quantum 
simulations of non-Abelian lattice gauge theories. 

Small-scale quantum computers exist today in the laboratory as 
programmable quantum devices". In particular, trapped-ion quan- 
tum computers!’ provide a platform allowing a few hundred coherent 
quantum gates to act on a few qubits, with a clear roadmap towards 
scaling up these devices*'°. This provides the tools for universal digital 
quantum simulation'®, where the time evolution of a quantum system 
is approximated as a stroboscopic sequence of quantum gates!”. Here 
we show how this technology can be used to simulate the real-time 
dynamics of a minimal model of a lattice gauge theory, realizing the 
Schwinger model®” as a one-dimensional quantum field theory with a 
chain of trapped ions (Fig. 1). 

Our few-qubit demonstration is a first step towards simulating 
real-time dynamics in gauge theories: such simulations are funda- 
mental for the understanding of many physical phenomena, including 
thermalization after heavy-ion collisions and pair creation studied at 
high-intensity laser facilities®’®. Although existing classical numerical 
methods such as quantum Monte Carlo have been remarkably success- 
ful for describing equilibrium phenomena, no systematic techniques 
exist to tackle the dynamical long-time behaviour of all but very small 


systems. In contrast, quantum simulations aim at the long-term goal 
of solving the specific yet fundamental class of problems that currently 
cannot be tackled by these classical techniques. The digital approach 
we employ here is based on the Hamiltonian formulation of gauge 
theories’, and enables direct access to the system wavefunction. As 
we show below, this allows us to investigate entanglement generation 
during particle-antiparticle production, emphasizing a novel perspec- 
tive on the dynamics of the Schwinger mechanism”. 

Digital quantum simulations described in the present work are con- 
ceptually different from, and fundamentally more challenging than, 
previously reported condensed-matter-motivated simulations of spin 
and Hubbard-type models*!?”°. In gauge theories, local symmetries 
lead to the introduction of dynamical gauge fields obeying a Gauss law’. 
Formally, this crucial feature is described by local symmetry generators 
{Gj} that commute with the Hamiltonian of the system [H, G;] =0and 
restrict the dynamics to a subspace of physical states |Yonysical) which 
satisfy G;|Ynysical) = 4;|Yonysical)» Where qi are background charges. We 
will be interested in the case q;=0 for all i (see Methods). Realizing 
such a constrained dynamics on a quantum simulator is demanding 
and has been the focus of theoretical research®”!!?!-*4, Instead, to opti- 
mally use the finite resources represented by a few qubits of existing 
quantum hardware, we encode the gauge degrees of freedom in a long- 
range interaction between the fermions (electrons and positrons), 
which can be implemented efficiently on our experimental platform. 
This allows us to explore quantum simulation of coherent real-time 


Figure 1 | Quantum simulation of the Schwinger mechanism. a, The 
instability of the vacuum due to quantum fluctuations is one of the most 
fundamental effects in gauge theories. We simulate the coherent real-time 
dynamics of particle-antiparticle creation by realizing the Schwinger 
model (one-dimensional quantum electrodynamics) on a lattice, as 
described in the main text. b, The experimental setup for the simulation 
consists of a linear Paul trap, where a string of 49Ca* ions is confined. 

The electronic states of each ion, depicted as horizontal lines, encode 

a spin |{) or ||). These states can be manipulated using laser beams 

(see Methods for details). 
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Figure 2 | Encoding Wilson’s lattice gauge theories in digital quantum 
simulators. Matter fields, represented by one-component fermion fields 
, at sites n, interact via equation (1) with gauge variables defined on the 
links connecting the sites. a, Unoccupied odd (occupied even) sites, 
represented by filled (empty) circles, indicate the presence of an electron 
(positron). b, Gauge variables (shown as horizontal blue thick lines) are 
represented by operators La with integer eigenvalues L, =0, +1,..., 00. 
c, By mapping the fields ®, to Pauli operators 6, we obtain a spin model 
(the spins are represented by filled/empty arrows). In this language, the 
Gauss law governing the interaction of fermions and gauge variables reads 
f,-£,-1= 516% + (—1)"], where 6? is the diagonal Pauli matrix. The 
realization of ile Schwinger model ona small-scale device requires an 
optimized use of resources. We achieve this by eliminating the gauge fields 
at the cost of obtaining a model with long-range couplings (and additional 


dynamics with four qubits, exemplified here by the creation of 
electron-positron pairs (Fig. 1). 

To this end, we experimentally study the Schwinger model, which 
describes quantum electrodynamics in one dimension. This model is 
extensively used as a testbed for lattice gauge theories as it shares many 
important features with quantum chromodynamics, including con- 
finement, chiral symmetry breaking, and a topological theta vacuum®. 
In the Kogut-Susskind Hamiltonian formulation of the Schwinger 
model®”, 


Ala = —iw éle etd, .)—h.c.] 
ne (1) 


describes the interaction of fermionic field operators ©, at sites 
n=1...N with gauge fields that are represented by the canonically com- 
muting operators [6,,, Lm] = iOnm L,,and6, correspond to the electro- 
magnetic field and vector potential on the connection between sites 
nandn-+ 1. The latter can be eliminated by a gauge transformation (see 
Methods). The fields &, represent Kogut-Susskind fermions (Fig. 2), 
where the presence ofan electron (positron) is mapped to an unoccu- 
pied odd (occupied even) lattice site, allowing for a convenient incor- 
poration of particles and antiparticles in a single fermion field. 
Accordingly, the third term in equation (1), representing the rest mass 
m, obtains a staggered sign. The first term corresponds to the creation 
and annihilation of particle-antiparticle pairs, and the second term 
reflects the energy stored in the electric field. Their energy scales 
w= 1/(2a) and J=g¢7a/2 depend on the lattice spacing a and the 
fermion light coupling constant g. We use natural units h=c=1; 


LETTER 


lon 1 
lon 2 
lon 3 
lon 4 
lon 5 
lon 6 
lon 7 
lon 8 
lon 9 
lon 10 


local terms). More specifically, the Gauss law determines the gauge fields 
for a given matter configuration and background field €9. The elimination 
of the operators £,, transforms the original model with nearest-neighbour 
terms into a pure spin model with long-range couplings that corresponds 
to the Coulomb interaction between the charged particles. d, Coupling 
matrix of the resulting interactions for N= 10, along with the total spin 
Hamiltonian As. For illustration, e shows the couplings involving the 
fifth spin. The colours (and thicknesses) of lines represent the different 
interaction strengths cj according to the matrix shown in d. For 
implementing Ay ina scalable and efficient way, we introduce time steps 
of length T (f), each subdivided into three sections (g). In each of these 
(length not to scale), one of the three parts of Hs is realized as explained in 
Methods. h, The protocol for realizing A, for N=10. The i ions interact 
according to the Molmer-Sorensen (MS) Hamiltonian Ausz. During each 
short time window of length At, a different set of ions is coupled by Hse. 


therefore, a and t have the dimension of length, while w, J, m and g have 
the dimension of inverse length. 

To realize the model using trapped ions, we map the fermionic oper- 
ators &, to spin operators (Fig. 2a) by a Jordan—Wigner transforma- 
tion”, which converts the short-range hopping in equation (1) into 
nearest-neighbour spin. flip terms. In this formulation, the Gauss 
law takes the form £,,—£,,_;= lO, + (—1)"], where o,, are the Pauli 
matrices. This law is the lattice version of the continuum law VE= p, 
where p is the charge density. As illustrated in Fig. 2c, the Gauss law 
completely determines the electric fields for a given spin configuration 
and choice of background field. Following ref. 12, we use this constraint 
to eliminate the operators L,, from the dynamics, adapting a scheme 
that has previously proven advantageous for numerical calculations” 
to a quantum simulation experiment, where the Gauss law is fulfilled 
by construction. 

The elimination of the gauge fields maps the original problem to 
a spin model with long-range interactions that reflect the Coulomb 
interactions between the simulated particles. This allows an efficient 
use of resources, since N spins can be used to simulate N particles and 
their accompanying N — 1 gauge fields. However, as shown in Fig. 2d, 
the required couplings and local terms have a very unusual distance 
and position dependence. The challenge has thus been moved from 
engineering a constrained dynamics of 2N— 1 quantum systems on 
a gauge-invariant Hilbert space to the realization of an exotic and 
asymmetric interaction of N spins. 

Our platform is ideally suited for this task, since long-range interac- 
tions and precise single qubit operations are available in trapped-ion 
systems. These capabilities allow us to realize the required interactions 
by means of a digital quantum simulation scheme”. To this end, the 
desired Hamiltonian, H=>-{_, Hy is split into K parts that can be 
directly implemented and are applied separately in subsequent time 
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Figure 3 | Time evolution of the particle number density, v. a, We show 
the ideal evolution under the Schwinger Hamiltonian Hs shown in Fig. 2d, 
the ideal evolution considering time discretization errors (see Fig. 2), 

the expected evolution including an experimental (exp.) error model 

(see Methods) and the experimental data for electric field energy J=w 
and particle mass m = 0.5w (see equation (1)). After postselection of the 
experimental data (see Methods), the remaining populations are {86 + 2, 
79 +1,73+1, 69 + 1}% after {1, 2, 3, 4} time steps (averaged over all 

data sets). Error bars correspond to standard deviations estimated from a 
Monte Carlo bootstrapping procedure. The insets show the initial state 

of the simulation (left inset), corresponding to the bare vacuum with 
particle number density v= 0, as well as one example of a state containing 
one pair (right inset), that is, a state with v= 0.5, represented as 
filled/empty arrows as in Fig. 2. b, Experimental data and c, theoretical 
prediction for the evolution of the particle number density v as a function 
of the dimensionless time wt and the dimensionless particle mass m/w, 
with J=w. 


windows. By repeating the sequence multiple times, the resulting time 
evolution of the system U(t) closely resembles an evolution where the 
individual parts of the Hamiltonian act simultaneously, as can be shown 
using the Suzuki-Lie—-Trotter expansion: 


« # n 
U(t)=e = lim | & ete) 
n—ool k=1 

Our scheme is depicted in Fig. 2f-h. It allows for an efficient realization 
of the required dynamics and implements the coupling matrix shown 
in Fig. 2d, e with a minimal number of time steps, scaling only linearly 
in the number of sites N. The scheme is therefore scalable to larger 
systems. A discussion of finite size effects can be found in Methods. 

We realize the simulation in a quantum information processor based 
ona string of “°Cat ions confined in a macroscopic linear Paul trap 
(Fig. 1b). There, each qubit is encoded in the electronic states | |) =4Sy/2 
(with magnetic quantum number m= —1/2), |}) =3Ds;2 (m= —1/2) 
of a single ion. The energy difference between these states is in the 
optical domain, so the state of the qubit can be manipulated using laser 
light pulses. More specifically, a universal set of high-fidelity quantum 
operations is available, consisting of collective rotations around the 
equator of the Bloch sphere, addressed rotations around the z axis and 
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Figure 4 | Time evolution of the vacuum persistence amplitude and 
entanglement. We show the square of the vacuum persistence amplitude 
|G(t)|* (the Loschmidt echo), which quantifies the decay of the unstable 
vacuum, and the logarithmic negativity E,,, a measure of the entanglement 
between the left and the right halves of the system. a, b, The time evolution 
of |G(t)|* (a) and E,, (b) for different values of the particle mass m and 
fixed electric field energy J= w, where w is the rate of particle-antiparticle 
creation and annihilation (compare equation (1)), as a function of the 
dimensionless time wt. c, d, The time evolution of |G(t)|* (c) and E,, (d) 
changes for different values of J and fixed particle mass m=0. Circles 
correspond to the experimental data and squares connected by solid lines 
to the expected evolution assuming an experimental error model explained 
in Methods. Error bars correspond to standard deviations estimated from 
a Monte Carlo bootstrapping procedure. e, Illustration of the creation of a 
particle-antiparticle pair starting from the bare vacuum state. 


entangling Molmer-Sorensen (MS) gates”®. With a sequence of these 
gates, arbitrary unitary operations can be implemented”. Thus, we 
are able to simulate any Hamiltonian evolution, and in particular the 
interactions required here, by means of digital quantum simulation 
techniques, as shown in Fig. 2. Each of the implemented time evolu- 
tions consists of a sequence of over 200 quantum gates (see Extended 
Data Fig. 3). In order to realize the non-local interactions Hz, and H+ 
with their specific long-range interactions, we use global MS entan- 
gling gates together with a spectroscopic decoupling method to tailor 
the range of the interaction. For the decoupling, the population of the 
ions that are not involved in the specific operations are shelved into 
additional electronic states that are not affected by the light for the 
entangling operations (see Methods). The local terms in H, correspond 
to z rotations that are directly available in our set of operations. The 
strength of all terms can be tuned by changing the duration of the laser 
pulses corresponding to the physical operations. 

Within our scheme, a wide range of fundamental properties in 
one-dimensional lattice gauge theories can be studied. To demonstrate 
our approach, we concentrate on simulating the coherent quantum 
real-time dynamics of the Schwinger mechanism, that is, the creation 
of particle-antiparticle pairs out of the bare vacuum |vacuum), 
where matter is entirely absent (see Methods). After initializing the 
system in this state, which corresponds to the ground state for m — oo 
(Fig. 3a), we apply Hs (Fig. 2d) for different masses and coupling 
strengths. As a first step, we measure the particle number density 
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v(t) = x wii ((—1)'67(t) + 1) generated after a simulated time evo- 


lution of duration t. The value v=0.5 corresponds to a state containing 
on average one pair (Fig. 2b). As Fig. 3c shows, an initial phase of rapid 
pair creation is followed by a reduction of v(t) due to recombination 
effects. The measured evolution shows excellent agreement with theo- 
retical predictions, assuming uncorrelated dephasing with an error 
probability p = 0.038 per qubit and per step, as explained in Methods. 
In Fig. 3b, we probe the particle—antiparticle generation for a broad 
range of masses m. Larger values of m increase the energy cost of pair 
production and thus lead to faster oscillations with a suppressed 
magnitude (see also Methods and Extended Data). 

Our platform allows direct measurements of the vacuum persistence 
amplitude and of the generated entanglement. The vacuum persistence 
amplitude G(t) = (vacuum|e~s‘|vacuum) quantifies the decay of the 
unstable vacuum (see Methods). The associated probability |G(t) |? 
shown in Fig. 4a, c, also known as the Loschmidt echo, is important in 
contexts such as quantum chaos”8 and dynamical critical phenomena 
far from equilibrium”. 

The vacuum decay continuously produces entanglement, as particles 
and antiparticles are constantly generated and propagate away from 
each other, thus correlating distant parts of the system. Entanglement 
plays a crucial role in the characterization of dynamical processes in 
quantum many-body systems, and its analysis permits us to quantify 
the quantum character of the generated correlations. To this end, we 
reconstruct the density matrix after each time step by full state tomo- 
graphy, and evaluate the entanglement of one half of the system with 
the other by calculating the logarithmic negativity. This quantity is an 
entanglement measure for mixed states*’, which is defined as the sum 
of the negative eigenvalues of the partially transposed density matrix. 
The entanglement between two contiguous blocks of our spin system 
is equivalent to the entanglement in the simulated fermionic system 
described by equation (1), that is, including the gauge fields (C.A.M. 
et al., manuscript in preparation). In Fig. 4b, d, we show the real-time 
dynamics of the logarithmic negativity for different parameter regimes. 
Entanglement between the two halves of the system is due to the pres- 
ence of a pair distributed across them. Accordingly, less entanglement 
is produced for increasing particle masses m and field energies J. The 
latter has a stronger influence, as it not only raises the energy cost 
for the creation of a pair but also for increasing the distance between 
particle and antiparticle. 

Our study should be understood as a first step in the effort to sim- 
ulate increasingly complex dynamics, including quantum simulations 
of lattice gauge theories®, that cannot be tackled by classical numeri- 
cal methods. Building on these results, future challenges include the 
quantum simulation of non-Abelian lattice gauge theories and systems 
beyond one dimension. 
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METHODS 


Encoding of the lattice Schwinger model. Our starting point is the Kogut- 
Susskind Hamiltonian formulation of the lattice Schwinger model®”, see equation 
(1) in the main text. This model describes one-component fermion fields é, that 
are located at lattice sites n and interact with gauge fields that are represented by the 


canonically commuting operators [Ans Ll = iby, mas illustrated in Fig. 2. 6, and Ln 
represent the vector potential and electromagnetic field on the link connecting 
sites n and n+ 1. The dynamics is contained by local conservation laws. 


Formally, these are described in terms of local symmetry generators 
G=L, Eq=1 b8, su ( 
generators Gi|Wphysical) = =4q; 1 Wphysical)» Where the q; are background charges. In the 
continuum limit, we recover the familiar form of the Gauss law VE = p, where p is 
the total charge density. The Schwinger Hamiltonian hat commutes with the local 
symmetry generators [Ahats Gi] and does not mix eigenstates of G; with different 
eigenvalues. Thus, the Hilbert space is divided into different sectors with different 
static charge configurations. We are interested in the case q;=0, that is, in the zero 
charge sector with an equal number of particles and antiparticles. Our dynamics is 
therefore constrained by the Gauss law: 


1)"]. Physical states are eigenstates of these 


eg ata 1 


Ln Ly-1 =, at 


(=1)"] (2) 


Equation (2) can be understood by considering a fixed field operator L,,andan 
adjacent spin &, to its right. As shown in Fig. 2a, spins in state |) (||)) on an odd 
(even) lattice site indicate that this lattice site is in the vacuum state, that is, not 
occupied by a particle or antiparticle. Accordingly, L L, =£,_1. Spins in the state 
|t) on even lattice sites s (corresponding to positrons) generate (+1) unit of electric 
flux to the right La= = Te 1 + 1. Similarly, spins in the state \1) on odd lattice sites 
(corresponding to electrons) lead to a decrease of one unit, L,=L,-1— 1.Inorder 
to cast the lattice Schwinger Hamiltonian given in equation (1) in the main text in 
the form of a spin model, the one-component fermion operators ©, are mapped 
to Pauli spin operators by means of a Jordan—Wigner transformation*!: 


&, = | lide, é =[[[-ié7] at 


I<n l<n 


This leads to 


N-1 t 
Hspin=w D> [a peG7 4 +h.c.] 


n=1 

i N N-1 , 
+= SO (-1)"67 47 0 £, 

2 n=1 n=1 


where constant terms (energy offsets) have been omitted. Using this expression, 
the gauge degrees of freedom are eliminated in a two-step procedure”. First, the 
operators @, are eliminated by a gauge transformation: 


6, Tle #6, 
l<n 


Ina second step, the electric field operators £.,, are eliminated iteratively using the 
spin version of the Gauss law given in equation (2): 


N 
Bs= FO IWei-tw 10 Aon th.c.] 
n=1 
N-1 12 2 (3) 
t+] Sleo+ => [emt (-D"] 
n=1 m=1 


The free parameter € corresponds to the boundary electric field on the link to the 
left of the first lattice site (see Fig. 2b, c). Throughout this paper we consider the 
case of zero background field, where €)=0. 

The gauge fields do not appear explicitly in this description. Instead, 
they effectively generate a non-local long-range interaction that corre- 
sponds to the Coulomb interaction between the simulated charged parti- 
cles. So far, this encoding approach has only been employed as a tool for 
analytical or numerical calculations!**?%, In contrast, we investigate here 
the use of this idea for a quantum simulation scheme, that is, the realization 


of the Schwinger model in its encoded form in an actual physical sys- 
tem. This approach has the advantage that, by construction, the dynamics 
takes place in the physically allowed subspace where the Gauss law is obeyed. 
In typical proposals for the quantum simulation of lattice gauge theories, this is 
fulfilled only up to some energy scale, as it is typically imposed energetically or 
by exploiting mechanisms where imperfections due to gauge-variant terms are 
strongly suppressed°. 

Digital quantum simulation of the encoded Schwinger model. We realize As 
given in equation (3) by means of a digital quantum simulation scheme!’, which 
will be described in detail elsewhere (C.A.M. et al., manuscript in preparation). 
For convenience, we express the simulated Hamiltonian in the form 


Ais = Ae + H+ Ay (4) 


where the three parts of the Hamiltonian correspond to the two different types of 
two-body couplings H,, and H, as well as local terms H;: 


A= D> CamO,F mn 


n<m 


Ay= WY (GIG nit ony iOn) 
n 


= Se FID ie om 


The simulation protocol is based on time-coarse graining, where the desired 
dynamics of the Hamiltonian given by equation (3) is obtained within a time- 
averaged description. As illustrated in Fig. 2f, the total simulation time t.im is 
divided into individual time windows of duration T. During each of these time 
windows, a full cycle of the protocol that is described below is performed. This cycle 
is repeated multiple times from t=0 to t= f,im and consists of three sections, as 
shown in Fig. 2g. Each of these sections corresponds to one of the three parts of the 
desired Hamiltonian given by equation (4). In the first section, A. ‘zz is simulated, in 
the second, the nearest-neighbour terms Hare realized and in the third, the single 
particle rotations H, are performed. In this way, the simulation scheme uses only 
two types of interactions, local rotations and an infinite-range entangling operation 


Avisx = Jo) 076% (5) 


nym 


which is routinely implemented in trapped ions by means of MS gates”®. In the 
following, we explain how the individual parts of the Hamiltonian are realized. 
More detailed explanations can be found elsewhere (C.A.M. et al., manuscript in 
preparation). The relative strengths of the individual parts of Hs, J, wand m, can 
be tuned by adjusting the length of the elementary time windows or the strength 
of the underlying interaction Jy accordingly. 

Long-range interactions H,,. The first part of equation (4) originates from the third 
term in equation (3) representing the electric-field energy. It takes the form 


mon (6) 


and describes two-body interactions with an asymmetric distance dependence, 
where each spin interacts with constant strength with all spins to its left, while the 
coupling to the spins on its right decreases linearly with distance (see Fig. 2d, e). 
As the number of elements in the spin coupling matrix is proportional to N’, a 
brute force digital simulation approach to this problem would require N” time 
steps. Using our protocol, which is inspired by techniques put forward in ref. 34, the 
required resources scale only linearly in N. This is accomplished using the scheme 
illustrated in Fig. 2h. We introduce N — 2 time windows, which can be shown 
to be the minimal number of time steps required to simulate the Hamiltonian 
in equation (6). Each elementary time window has length At. In the nth time 
window, the Hamiltonian 


(n) n+l 


Fixss=0)_ 918 o565 


is applied. H ea is realized by applying the Hamiltonian given in equation (5) in 
= Asp where R(y) = effin 4?, 
The eb ia time- ee Hamiltonian for the first section of the time interval T, 


combination with local rotations, R(y) HsxR" (y) 


Hy =—— ee 1 pis ie is proportional to the desired Hamiltonian in equation (6), 
Ny oi, 
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As shown in Fig. 2h, only ions 1 to n+ 1 participate in the entangling interaction 
in time step n. Since the interaction is implemented via a global beam that couples 
to the entire ion string (see Fig. 1b), ions n + 2 to N are decoupled by applying 
hiding pulses. The population in the qubit states of these ions is transferred to 
electronic levels that are not affected by the interaction using suitable laser pulses. 
The population in the state | |) =4S}/. (magnetic number m= —1/2) is transferred 
to the state 3Ds/. (m= —5/2), and the population in |1) = 3Ds/2. (m= —1/2) is 
transferred to the state 3D5/2 (m= —3/2) via 4812 (m=+1/2). 

Nearest-neighbour terms Ag. The second part of equation (4), 


Hy= "oO 


n nOngit hic.) 


corresponds to the creation and annihilation of particle—antiparticle pairs (see 
Fig. 2a, c). For realizing this Hamiltonian, the interaction given in equation (5) 
needs to be modified not only in range, but also regarding the type of coupling. 
This is accomplished by dividing the time window dedicated to realizing Hx. (see 
Fig. 2g) into N — 1 elementary time slots of length Aty. Each of these is used for 
inducing the required type of interaction between a specific pair of neighbouring 
ions. For example, the first elementary time slot of length Aty; is used to engineer 
an interaction of the type Ay od aio; + h.c. between the first and the second spin, 
the second time slot is used to do the same for the second and the third spin, and 
so on. This can be done by applying suitable hiding pulses to all spins except for a 
selected pair of ions i and j. The selected pair undergoes a sequence of gates, which 
transforms the 676% -type coupling in equation (5) into an interaction of the 
required form and consists of four steps: (i) a single qubit operation on the two 
selected spins i and j, U= e'#(7'+}) (ji) an evolution aude the Hamiltonian given 
in equation (5) for the selected pair of spins, Avisx during a time At,)/2, 


py) 
e tA MsxAti/2. Gai) another single qubit operation U'; and finally (iv) another two- 


qubit gate ell MsxAt/2 . The time evolution operator associated with the described 


a ini) : 
sequence of gates is given by #1 4‘ with 


a(ij) 1 (ii) a (ii) 
Ay ==> 5 | Fise + U7 Hvis 


=)o(676; j thc.) 

as desired. The relative strength of the nearest-neighbour terms Az and the long- 
range couplings Hz, w/J can be adjusted by tuning the ratio of the lengths of the 
elementary time windows Aj/Ay. 

Single-particle terms H,. The last contribution to the Hamiltonian in equation (4) 
consists of two terms H, = MY ,,CnO, + IX, Eno, The first term in this expression 
reflects the rest masses of the fermions. The second term is an effective single- 
particle contribution originating from the third part of equation (3) and corre- 
sponds to a change in the effective fermion masses due to the elimination of the 
electric fields. The local terms of the simulated Hamiltonian are given by: 


n 


N- 
= (nmod2) > 67 


l=1 


m= 
\| 
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These are implemented by means of AC-Stark shifts, induced by laser pulses that 
are far red-detuned from the qubit transition’?””, 

Measurement and postselection. For each set of system parameters and number 
of simulation time steps, we perform a full state tomography to determine the 
density matrix that corresponds to the quantum state of the system. The electronic 
state of the ions is detected via a fluorescence measurement using the electron 
shelving technique””. The entire string is imaged by a CCD camera, performing a 
full projective measurement in the z basis. This procedure is repeated 100 times 
to gather sufficient statistics. 

As a consequence of charge conservation, an equal number of particles and 
antiparticles is created during the ideal dynamics of the system. Since our evolu- 
tion starts with the vacuum state, the physical Hilbert space of the simulation is 
spanned by the six states {|0000) =|} | 11), je e*00) =|| TTL), |Oete 0) =|TF11), 
|00e-e*) =|T1 11), e-O0er) =| 171), and |e~etee*) =| 1 11)}, where |0) 
denotes the vacuum, |e~) a particle and |e*) an antiparticle. However, experi- 
mental errors during the simulation produce leakage from this subspace, such 
that non-physical states such as |e~000) =|| | 11) get populated. Therefore, the 
raw measured density matrices /;ay are projected onto the Hilbert space spanned 
by the physical states and normalized, 


PPrawP 
tr(PPrawP) 


where P is the projector onto the physical subspace. All experimental data pre- 
sented in this work correspond to physical density matrices Pppys postselected in 


Pphys = 
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this way. The populations remaining in the physical subspace along the evolution 
are discussed in the following section. 

Experimental errors. The bulk of the quantum gates in the simulation consists of 
hiding/unhiding pulses and MS gates. Each 7 pulse on a hiding transition has a 
fidelity of around 99.5%, and there are 30 such pulses per step, yielding a lower 
bound on the fidelity per step of (0.995)*” = 0.86. The fidelity of a fully-entangling 
(x/2) MS gate on 4 ions is around 97.5%, and one simulation step has 8 quarter- 
entangling (1/8) gates, yielding a lower bound of (0.975)*/1=0.95. The total lower 
bound for the fidelity per step is F = (0.995)*° - (0.975)8/4 = 76%; it is indeed lower 
than the average fidelity of the raw (not postselected) state after the first step, which 
is 89%. The sequence performs better than might be expected from the raw 
fidelities; we believe this is because the ideal evolution stays at all times in a 
decoherence-free subspace. 

A useful measure of the performance of the evolution is the population leakage 
from the physical subspace. After {1,2, 3,4} evolution time steps, the measured 
populations remaining in the physical subspace were on average {86 +2, 79+ 1, 
73 +1, 69 + 1}% of the populations before postselection (the average is taken over 
the 7 simulation runs shown in the paper). The population loss per simulation 
step seems consistent with the errors induced by the hiding/unhiding operations. 

The remaining errors can be quantified by the average fidelity of the postselected 
state with the ideal state. After the first evolution step this is 96%, which is consist- 
ent with the total fidelity of the MS gates. To quantify the performance of the 
simulation along the whole evolution, we compare the experimental data to a 
simple phenomenological error model. Since the postselection already partially 
corrects for population errors, we considered an error model that consists of uncor- 
related dephasing, parameterized with an phase flip error probability p per qubit 
and per evolution time step. The density matrix p is then, at each evolution step, 
subject to the composition (denoted ©) of the error channels €; for each qubit 


p— E40&30€0E\(p) 


where 


Ep) =(1—p)p + pojpo; 


The value for the error probability p was extracted from a fit to all of the experi- 
mental data collected. For all the data taken with non-zero J we found a value of 
p=0.038. Whenever J=0, the simulation does not require any zz interactions. 
Thus, several entangling gates are omitted from the sequence and consequently 
higher fidelities are expected. Indeed, for this case the error probability per time 
step was found to be p=0.031. 

Quantum simulation of the Schwinger mechanism. We simulate the coherent 
real-time dynamics in the Schwinger model focusing on the Schwinger mecha- 
nism, that is, spontaneous particle—antiparticle production out of the unstable 
vacuum. This effect is at the heart of quantum electrodynamics and its observation 
is currently pursued at high intensity laser facilities ELI and XCELS'* (theoretical 
proposals for its quantum simulation can for example be found elsewhere®”*> °°), 
To simulate the dynamics of pair creation, we consider as is usual”!° the bare 
vacuum as initial state, where matter is completely absent, |vacuum) = |0000). In 
the spin representation this state is accordingly given by ||| ||). Note that the bare 
vacuum is different from the so-called dressed vacuum state, which is the ground 
state of the full Hamiltonian. 

Decay of the vacuum. The natural quantity characterizing the decay of the unstable 
vacuum is the vacuum persistence amplitude introduced by Schwinger’’, which is 
defined as the overlap of the initial state |Y(0)) = |vacuum) with the time-evolved state 


G(t) = (vacuum|e~ ist vacuum) 


Within the original formulation, the Schwinger mechanism was considered for the 
continuum system and a classical electric field of strength E (ref. 37). There, it has 
been shown that the particle number density (£) is directly related to the rate func- 
tion \(t) that characterizes the decay of the vacuum persistence probability |G(¢)|?: 


Ae) == Jim — logl|G(0) PI 


Specifically, in the limit of large fermion masses m >> gE with q the electric 
charge, as relevant in the high-energy context, A(t) =1(t) for thermodynamically 
large systems in the continuum. 

Since vacuum persistence amplitudes have so far not been measured, this con- 
nection between A(t) and v(t) has not yet been tested experimentally. In Extended 
Data Fig. 1, we show the measured rate function \(f) and find good qualitative 
agreement with 1(¢), even for the few qubits in our digital quantum simulation. 
Finite size effects. In the following, we discuss the dependence of the results on the 
number of lattice sites N. Extended Data Fig. 2 shows the time evolution of the 
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particle number density and the entanglement for different system sizes N. For our 
experimental system with N= 4, we already find qualitative agreement with respect 
to the results expected for larger N. By scaling up the system, the dynamics quickly 
converges for the considered parameters. We address elsewhere the continuum 
limit a — 0, N 00 for fixed values of the coupling g and the mass m (C.A.M. 
et al., manuscript in preparation). 
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Extended Data Figure 1 | Comparison of the evolutions of the particle (see equation (1) in the main text). c, d, Evolution of v(t) (c) and A(t) (d) 
number density v(t) and the rate function A(t). The decay of the for different values of J and fixed particle mass m = 0 as a function of the 
vacuum persistence probability is characterized by the rate function X(t), dimensionless time wt. e, Comparison the evolutions of v(t) and A(t) 
defined by |G(t) = e N\_a, b, Time evolution of v(t) (a) and X(t) (b) for J=w and masses m= 0 (upper two curves) and m= w/2 (lower two 
for different values of the particle mass m and fixed electric field energy curves). Error bars correspond to standard deviations estimated from a 


J=w, where w is the rate of particle—antiparticle creation and annihilation | Monte Carlo bootstrapping procedure. 
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Extended Data Figure 2 | Finite size effects. Evolution of the particle spin chain and quantifies the entanglement between the two halves of the 
number density v= a2 ey (- 1)! 6#(t) + 1) (top) and the logarithmic system. Both quantities are shown as a function of the dimensionless 
std time wt for J= m= w. The shaded area corresponds to the time interval 


negativity E,, (bottom) for different system sizes N. The logarithmic caplaved th thigedecinent, 


negativity is evaluated with respect to a cut in the middle of the considered 
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INITIAL PREPARATION ~ & 


PER EVOLUTION STEP (x 4) 


LETTER 


FINAL RECOUPLING 


R(t,0,1) 

R(r,0,3) 
R(0.0777, 0.6517, 2) 
R(0.011r, 0. 917,4) 


% DECOUPLE 4 
HidingA(11,0,4) 
HidingB(11,0,4) 
HidingC(17,0,4) 


% DECOUPLE 3 
HidingA(17,0,3) 
HidingB(1,0,3) 
HidingC(11,0,3) 
HidingB(0.0417, 0.6517, 2) 
HidingA(0.041r, 0.6517, 2) 


% VACUUM PREPARATION 


Extended Data Figure 3 | Experimental pulse sequence. This laser pulse 
sequence implements the evolution described in Fig. 2f, g. The pulses are 
listed in the order in which they are applied, as indicated by the arrows. 
The pulses in the first box prepare the initial state, those in the second 
box implement one step of the time evolution, and those in the third box 
recouple the ions to the computational subspace, that is, bring back their 
populations to the qubit transition 48)/.(m = —1/2) to 3Ds).(m=—1/2). 
The operations shown in the middle box are repeated once per evolution 
step, resulting in a total number of 12 +51 x 4+ 6=222 pulses for 

4 evolution steps. The pulses are labelled in the form Pulse(0, ¢, target 


%% Ht TERM %% 
% SIGMA ON 1,2 
MS(At,0,all) 
MS(At,11/2, all) 


% RECOUPLE 4, 3 
HidingC(r,11,4) 
HidingB(1, 11,4) 
HidingA(t, 11,4) 
HidingC(0.02rr, 1.577, 3) 
HidingA(0.02r7, 1.517, 3) 
HidingC(t1, 11,3) 
HidingB(1,1,3) 
HidingA(t,11,3) 
HidingB(0.03rr, 1.6577, 2) 
HidingA(0.03rr, 1.6577, 2) 


% DECOUPLE 1, 2 
HidingA(t,0,1) 
HidingB(11,0,1) 
HidingC(r1,0,1) 
HidingC(0.03r7, 0. 67r, 1) 
HidingA(0.03rr, 0.677, 1) 
HidingB(0.02rr,0.65rr, 2) 
HidingA(0.02rr,0.65rr, 2) 
HidingA(t,0,2) 
HidingB(t,0,2) 
HidingC(t1,0,2) 


% SIGMA ON 3,4 
MS(At,0,all) 
MS(At,11/2, all) 


% RECOUPLE 2 
HidingC(t1, 11,2) 
HidingB(1,17,2) 
HidingA(t, 17,2) 
HidingC(0.04rr, 0. 177, 1) 
HidingA(0.04r7, 0. 177, 1) 


(continues next column) 


% DECOUPLE 4 
HidingA(7,0,4) 
HidingB(7,0,4) 
HidingC(1,0,4) 
HidingA(0.06rr, 0.671, 3) 
HidingB(0.06rr, 0.6rr,3) 


% SIGMA ON 2,3 
MS(At,0,all) 
MS(At,11/2, all) 


% RECOUPLE 1 
HidingC(t, Tr, 1) 
HidingB(tr,11,1) 
HidingA(tr,11,1) 


%% HZ TERM %% 
Z((2m+2J)At,1) 
Z(JAt,2) 
Z((2m+J)At,3) 


%% HZZ TERM %% 

% MSZ GATE ON 1,2,3 
R(11/2,11/2, all) 
MS(At,0,all) 
R(11/2,-11/2, all) 


% DECOUPLE 3 
HidingA(t,0,3) 
HidingB(1,0,3) 
HidingC(1,0,3) 


% MSZ GATE ON 1,2 
R(11/2,11/2,all) 
MS(At,0,all) 
R(11/2,-11/2, all) 


% RECOUPLE 3 
HidingC (1, 11,3) 
HidingB(1,11,3) 
HidingA(t,1,3) 


% RECOUPLE 4 
HidingC(tr,11,4) 
HidingB(r1,11,4) 
HidingA(t1, 11,4) 
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qubit), where @ is the rotation angle (length) of the pulse, ¢ its phase, 
and the target qubit is an integer from 1 to 4 for addressed operations 
or ‘all for global operations. ‘R’ denotes a pulse on the qubit transition 
4S} 2(m = —1/2) to 3Ds/2(m = — 1/2). ‘MS’ corresponds to an MS gate 
on the same transition. The hiding pulses ‘HidingA,B,C’ are applied 
on the transitions as follows: A, 48,/2(m = —1/2) to 3Ds;2(m = —5/2); 
B, 4S1;2(m => +1/2) to 3Ds/2(m => —1/2); iG; 4S1;2(m = +1/2) to 
3Ds/2(m = —3/2). These transitions are shown in the level scheme at 
the bottom right. The pulses shown in italics serve the purpose of 
correcting addressing crosstalk. 
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Solid-state harmonics beyond the atomic limit 


Georges Ndabashimiye!, Shambhu Ghimire*, Mengxi Wu’, Dana A. Browne’, Kenneth J. Schafer?, Mette B. Gaarde* & 


David A. Reis? 


Strong-field laser excitation of solids can produce extremely 
nonlinear electronic and optical behaviour. As recently 
demonstrated, this includes the generation of high harmonics 
extending into the vacuum-ultraviolet and extreme-ultraviolet 
regions of the electromagnetic spectrum’ *. High harmonic 
generation is shown to occur fundamentally differently in solids and 
in dilute atomic gases!~®-13, How the microscopic mechanisms in 
the solid and the gas differ remains a topic of intense debate!"1)14"18, 
Here we report a direct comparison of high harmonic generation in 
the solid and gas phases of argon and krypton. Owing to the weak 
van der Waals interaction, rare (noble)-gas solids are a near-ideal 
medium in which to study the role of high density and periodicity in 
the generation process. We find that the high harmonic generation 
spectra from the rare-gas solids exhibit multiple plateaus extending 
well beyond the atomic limit of the corresponding gas-phase 
harmonics measured under similar conditions. The appearance of 
multiple plateaus indicates strong interband couplings involving 
multiple single-particle bands. We also compare the dependence 
of the solid and gas harmonic yield on laser ellipticity and find 
that they are similar, suggesting the importance of electron-hole 
recollision in these solids. This implies that gas-phase methods such 
as polarization gating for attosecond pulse generation and orbital 
tomography could be realized in solids. 

Following the initial discovery of nonperturbative high- 
harmonic generation in solids!, several experimental? ® and theoret- 
ical'®'1-4!5!7 investigations have aimed to understand its detailed 
microscopic mechanism. In particular, the roles of the high density, 
periodicity and bonding and how they relate to atomic high har- 
monic generation (HHG) remains elusive. One striking difference 
between the harmonics from solids and from gases is in the scaling 
of the high-energy cutoff. In experiments on several materials (ZnO, 
SiO, and GaSe) with pump wavelengths spanning the terahertz to 
the near-infrared regions of the spectrum, the cutoff was found to 
scale linearly with the electric field for the solid! *°, whereas it scales 
linearly with the intensity for dilute gases'”'’. In addition, in ZnO 
the ellipticity dependence was observed to be much weaker than in 
atomic gases’*!?°. The field-dependence of the cutoff and weak 
ellipticity dependence is consistent with a semi-classical Bloch oscil- 
lation model for the nonlinear intraband acceleration of electrons 
that have tunnelled across the direct bandgap'*>'®. The extent of the 
cutoff, to well beyond the maximum Bloch frequency, further suggests 
that the process is sensitive to details of the band structure through 
interactions beyond nearest neighbours!*. However, interband con- 
tributions could also lead to a cutoff that is linear in the applied field, 
and the relative roles of inter- and intraband currents remains a topic 
of intense debate’ !b!4"18, 

In the generalized recollision picture proposed by Vampa 
et al.'*, electrons in the conduction band recombine with their asso- 
ciated holes in the valence band in a manner that can be described 
by semi-classical trajectories. In this model, the energy of interband 
transitions is constrained to be less than the maximum band sepa- 
ration. Wu et al. have proposed that higher-lying conduction bands 


can give rise to multiple plateaus in the HHG spectrum, each with a 
cutoff that is limited only by the field-dressed energy spacing between 
bands!®, Using a semiconductor Bloch equations treatment, Schubert 
et al. found that at far-infrared wavelengths, the HHG spectrum is 
dominated by intraband dynamics?. 

Until now, experiments have concentrated on covalently!+"8 bonded 
crystals. In such crystals the overlap of the atomic and molecular 
wavefunctions leads to a strong modification of the electronic states, 
making it difficult to extract material-independent aspects of the 
strong-field process. Rare-gas solids (RGS) are the nearest thing to a 
three-dimensional array of isolated atoms at high density”', owing to 
the closed shell structure and high ionization potential of rare gases, 
and their weak bonding due to van der Waals interactions. Here we 
study the HHG from Ar and Kr in both gas and solid phases. We find 
that the HHG spectra from the solids exhibit multiple plateaus extend- 
ing beyond both single-atom predictions and our measured gas har- 
monics for the same laser parameters. In addition, the photon energy 
of the solid-state harmonics greatly exceeds the maximum band sep- 
aration between the highest-valence and lowest-conduction bands, 
suggesting the importance of solid-state effects and electronic band 
structure even in weakly bonded RGS. 

The experiments were performed on 5-\1m-thick polycrystalline Ar 
and Kr RGS and 3 Torr of Ar and Kr in a 1-mm-long cell for the gas 
(see Methods). The targets were irradiated by beams of 50 fs, 1,333 nm 
(0.93 eV) and 50 fs, 1,500 nm (0.82 eV) generated by a 40-fs, 1-kHz 
amplified Ti:sapphire laser-pumped optical parametric amplifier. 
The intensity of the infrared beam was calibrated using the measured 
spectral cutoff in the Ar and Kr gas, assuming that the cutoff energy 
is given by the microscopic value!”!, Eour= Ip + 3.17U,, where I, is 
the ionization energy and U, is the ponderomotive energy of a free 
electron in the laser field. 

Figure 1 shows representative high harmonic spectra from solid Ar 
and Kr at two intensities each, using the 1,333-nm pump (16 TW cm? 
and 26TW cm * for Ar in Fig. laand 6.9TW cm ~’and 11.4TW cm? 
for Kr in Fig. 1b). The spectra from Ar and Kr are qualitatively similar, 
although the laser intensities for Kr are about a factor of two lower 
than Ar. A single plateau is evident in each at the lower intensity. 
For the higher intensity a second plateau is evident at ~25-33 eV for 
Ar and at ~19-31 eV for Kr. Figure 2a shows the harmonic spectra 
as a function of intensity for Ar at the 1,333-nm pump. As can be 
seen here, the second plateau appears suddenly over a very narrow 
intensity range and comprises several harmonics. The presence of 
the second plateau is in contrast to the experiment on HHG in dilute 
atomic gases, including those measured here, where only a single 
plateau is observed. Also shown in Fig. 2b are theoretical calculations, 
described below. 

The high-energy extent of the second plateau notably exceeds the 
cutoff for the gas for the same drive wavelength and intensity. At mod- 
erate peak intensities (10 TW cm ? for Kr and 20 TW cm * for Ar), 
the secondary plateau is almost entirely beyond the projected gas- 
phase cutoff. Figure 3 shows the cutoff as a function of intensity for 
Ar and Kr RGS and rare gases for the two different pump wavelengths. 
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Figure 1 | Representative spectra of HHG from solid Ar and Kr ona 
logarithmic scale for the driving wavelength of Ap = 1,333 nm. 

a, HHG spectra from solid Ar. The spectrum taken at low intensity 

has only harmonics of the first plateau. At higher intensity, there are 
harmonics in a second plateau that start from the 27th harmonic (25 eV) 
and end at the 35th harmonic (32.5 eV). A third plateau is present, 

but its harmonics are dimmer and only its first two harmonics can be 


The cutoff of the solid harmonics increases with increasing intensity 
in a nontrivial way, neither following the linear cutoff in intensity of 
the gas nor the square-root of intensity seen for other solids. The RGS 
cutoff is below the rare-gas cutoff at low intensity; however, with the 
sudden onset of a second plateau (as seen in Fig. 1) the RGS cutoff 
notably exceeds the rare-gas cutoff for the same drive wavelength and 
intensity. Even at moderate peak intensities (10 TW cm~° for Kr and 
20TW cm for Ar) the second plateau is almost entirely beyond the 
projected gas phase cutoff. The maximum photon energy detected 
using both wavelengths exceeds the maximum separation between 
the highest-valence and the lowest-conduction bands. According to 
Bacalis et al.?, this separation is around 19 eV in solid Ar and 16eV 
in solid Kr. In fact, the maximum band separation is below the higher 
end of the first plateau by about 3 eV in both solids. 

The nontrivial scaling of the high-energy cutoff with intensity and 
the sudden appearance of multiple plateaus are indicative of complex 
solid-state behaviour. The appearance of multiple plateaus can be 
understood in a model in which solid-state HHG results from strong- 
field-driven transitions of Bloch electrons!*"". This behaviour is linked 
to the coupling of pairs of higher-lying conduction bands that are 
reached in a step-like process'®. The cutoff energy and the strength 
of each plateau depend on the energy separation and the coupling 


a Experiment: Ar (1, = 1,333 nm) 


15 


Laser Fiat (TW - 


Oo w 
= Oo 


Dw} 
NI 


Photon energy (eV) 
ine} 
(2) 


Fa On FA DN 


es, 
K<e} 


Figure 2 | Evolution of the HHG spectrum as a function of the laser 
intensity. The colour scale shows the logarithm of the intensity. 

a, Experimental data. HHG of solid Ar using 1,333-nm drive laser. 

At moderate peak intensities, the high-energy cutoff increases smoothly 
(up to the 27th harmonic). At around 20 TW cm ~* the spectral cutoff 
increases suddenly to the 35th harmonic. The first plateau is brighter 
than the second at all intensities. b, Theoretical results. HHG spectrum 
obtained by solving the time-dependent Schrédinger equation for a 
four-level system in which the energy separation and couplings between 


Photon energy (eV) 
distinguished from the background. b, HHG spectrum from solid Kr. 
The spectra are taken at different spectrometer configurations and 
two spectra from different spectrometer configurations have been 
concatenated for the higher intensity. The spectra of solid Kr behave in a 
way similar to that of the spectrum of solid Ar except that for solid Kr the 
harmonics in the second plateau start at the 21st harmonic (19.5 eV) and 
end at the 37th harmonic (34.4eV) and are much dimmer than in solid Ar. 


strength between pairs of bands”, and different plateaus can therefore 
exhibit different nonlinear scaling with laser intensity. Two general 
features of such multi-band couplings are the sudden appearance of 
the second plateau and the different slopes of cutoff energy versus 
intensity that are observed for the two plateaus. 

We apply this model for HHG in solid argon by solving the time- 
dependent Schrédinger equation for a four-band system in which the 
energy levels and dipole transition elements originate from a den- 
sity functional theory (DFT)-based band structure calculation**. In 
ref. 10, we show the formal equivalence between taking a Houston- 
state basis, where the electron and hole wavevectors k are a function 
of the instantaneous vector potential of the laser, and a Bloch-state 
basis, where the electron transitions are between different bands at 
fixed k. Therefore, in this single-particle picture both the interband 
and the intraband contribution to the HHG yield will be included 
when considering discrete states at k= 0, assuming that the tunnelling 
is concentrated near the direct gap at the zone centre (T point). 
Figure 2b shows the calculated harmonic spectrum as a function 
of laser intensity. The energies for the relevant electronic states at 
I are shown in the inset. The first plateau originates in the cou- 
pling between levels 1 and 2, and the second plateau in the coupling 
between 2 and 3. The dashed white curve indicates the prediction 
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levels correspond to those of the Ar band-structure at the zone centre 

([ point). The calculation qualitatively and semiquantitatively reproduces 
the experimental data. The dashed rectangle shows the corresponding 
range of the experimental data. The inset shows the energy levels used in 
the calculations, with arrows representing the coupling between levels. 
The dashed white curve indicates the prediction for the cutoff energy of 
the second plateau based on the energy difference between field-dressed 
levels 1 and 3. 
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Figure 3 | Comparison of the spectral cutoff of HHG in solid and gas 
Ar and Kr as a function of the laser intensity. The dotted straight lines 
are fits of the linear cutoff of gas harmonic (blue for Ar and red for Kr). 
At zero laser intensity the fit was constrained to be at the ionization 
energies of the Ar and Kr gases (I, = 15.7 eV and 14 eV), respectively. 

a, Spectral cutoff of the harmonics of 1,333 nm. For solid Ar and Kr the 
cutoff is not linear in the electrical field or intensity below the harmonics 
of photon energy 25 eV or 20 eV, respectively. In these laser intensity 
regions the cutoff is below that of a corresponding gas HHG (Ar and Kr). 
Above these photon energies, the cutoff curves abruptly turn to an almost 
vertical slope which indicates a switching on of new HHG processes for 
both solid Ar and Kr. Above 33 eV for Ar the cutoff slope decreases to 

a value almost equal to the slope before the sudden rise (below 25 eV). 
However, for solid Kr after the sudden increase in the cutoff slope, which 


for the cutoff energy of the second plateau based on the energy dif- 
ference between the field-dressed levels 1 and 3 (see Methods). The 
calculated values of the first and second cutoff energies compare 
reasonably well to those of the experiment, as shown in Fig. 2b and 
Fig. 3c. We note that to get the best agreement on the location and 
scaling of the plateaus to the experimental results, we have adjusted 
the dipole coupling strengths obtained from DFT (see Methods). In 
this case, the coupling strengths between levels 1 and 2, between levels 
2 and 3, and between levels 3 and 4 are comparable. This is indicative 
of a periodic system in which electrons are strongly localized around 
the individual atomic sites rather than the more delocalized electron 
behaviour one would find in covalently bonded semiconductors such 
as ZnO. 
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Figure 4 | Comparison of ellipticity dependence of the 25th and 31st 
harmonics from solid and gas-phase Ar at similar peak fields. The 25th 
and 31st harmonics lay in the first and second plateaus, respectively, for 
solid Ar. The harmonic intensity shows a similar ellipticity dependence for 
the two orders from both the solid and gas. The sensitivity to ellipticity is 
at least as strong in the solid as in the gas. 
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started around 20 eV to around 35 eV, no higher harmonic was observed 
above 35 eV even with increasing laser intensity. The error bars are the 
standard deviation of repeated cutoff measurements of solid Ar HHG. 

b, Spectral cutoff of the harmonics of 1,500 nm. For both solid Ar and Kr, 
the HHG cutoff respectively below 25 eV and 20 eV is almost linear in 
intensity and this cutoff is below of that of the corresponding gases. Above 
these photon energies the slope of the cutoff curves becomes vertical, as 
for the 1,333-nm case. However, for solid Kr the sudden increase in the 
cutoff energy ends at 31 eV, and a less steep cutoff develops between 31 eV 
and 41 eV. c, Comparison of measured cutoff and calculated cutoff. There 
is qualitative agreement between the measurement and the calculation. 
The error bars are the standard deviation of repeated cutoff measurements 
of solid Ar HHG. 


This conclusion is supported by the measured strong ellipticity 
dependence in RGS compared to what was previously measured 
in ZnO!. In Fig. 4, we show our measurements of the elliptic- 
ity dependence for two representative harmonics of gas and solid 
Ar. These harmonics were chosen such that they fall within the 
ranges of the first and second plateaus for the RGS (25th and 31st 
harmonics, respectively). These harmonics show similar ellip- 
ticity dependence for the different orders. Moreover, the sol- 
id-state harmonics are at least as sensitive to ellipticity as is the gas. 
We note that the strong ellipticity dependence in atomic and 
molecular HHG has been attributed to the transverse momentum 
of the tunnel-ionized electron causing the returning electron wave- 
packet to miss the parent ion!®”°. That characteristic has been used 
in the generation of isolated attosecond pulses” and in imaging of 
molecular orbitals”*. Our observation of strong ellipticity dependence 
of harmonics in RGS opens up similar opportunities for attosecond 
pulse generation and imaging of electronic wavefunctions in the 
solid state. 

We have measured high harmonics from solid and gas-phase 
Ar and Kr. The solid-state harmonics include multiple plateaus 
whose high-energy cutoff extends beyond the atomic limit. This 
demonstrates the importance of solid-state effects even in the weakly 
bound van der Waals solids. We show that the multiple plateaus can 
be accommodated in a single-particle picture, similar to the single- 
active electron models used to calculate gas-phase harmonics in both 
atomic and molecular systems, in which only one electron is assumed 
to interact with the laser field'*"°. We further note that the appearance 
of the second plateau occurs at very nearly twice the exciton energy 
(~24eV in Ar and ~20eV in Kr). Although this could be indicative 
of many body-effects beyond the single-particle picture, such a model 
is not required to capture the general features seen in the experiment. 
In either case, the important difference between the solid and the gas 
is that the solid involves transitions between bound states created by 
band folding due to the periodic potential. In this sense there is no 
free-particle continuum for electrons in the solid, or equivalently, 
even in the case of RGS, the effect of the periodic potential cannot 
be neglected. 
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METHODS 

Experiments. The RGS samples were grown inside an ultrahigh vacuum 
(<10-7mbar) chamber at a temperature of 20K for Ar and 27K for Kr. 
A closed-circuit cryostat was used to cool down a silicon wafer with 30-nm-thick 
silicon nitride windows (0.5mm x 0.5mm). The silicon nitride was used as the 
substrate for growing the crystals. The Ar and Kr crystals were grown at a rate of 
about 1j1m min“! and the thickness was measured using thin-film interference 
of a HeNe laser beam. Growth occurred in the absence of the strong-field infra- 
red pump beam. The silicon nitride windows were ablated at the focal spot by 
the strong-field pump subsequent to growth, leaving a free-standing RGS film. 
Sublimation of the samples was found to be negligible over the course of the 
exposure by the laser when excited below the damage threshold. The damage 
threshold was determined to be ~30 TW cm ” for solid Ar and ~15TW cm? 
for solid Kr by both visual inspection and loss of the harmonic emission and for 
the conditions reported here. The harmonic spectra were measured using a home- 
built spectrometer consisting of a flat-field imaging grating (Hitachi 001-0639) 
and micro channel plate (MCP) detection system. The MCP was mounted on 
movable bellows which permitted us to observe different spectral regions between 
10 eV and 45 eV. The high energy cutoff was determined by the intensity at which 
the highest harmonic is approximately three times the noise on the MCP, ignor- 
ing peaks that are inconsistent across multiple measurements. We note that the 
intensities used here are about an order of magnitude lower than in the typical 
gas-phase HHG experiments, and therefore reaching the fully phase-matched 
conditions in the gas would require a much higher pressure than we can achieve 
with our experimental setup. Because of that technical limitation, we performed 
measurements at relatively high peak intensity (>20 TW cm ? in Ar gas), and 
extrapolated the high-energy cutoff scaling results to the moderate peak inten- 
sity scales. The intercept on the cutoff energy axis corresponds to the ionization 
energy threshold, as expected. For the ellipticity dependence measurements, the 
polarization was varied using a quarter-wave plate. The peak field along the major 
axis was kept constant by adjusting the pulse energy. 

The Ar films were characterized under the same growth conditions by X-ray 
diffraction at the Stanford Synchrotron Radiation Lightsource and the size of the 
crystal grains was found to be at least 100nm. The sample thickness was chosen 
empirically to be thick enough to provide mechanical stability against the strong- 
field excitation and substrate ablation, but thin enough to mitigate propagation 
effects, including cascaded nonlinear wavemixing, which we observed to depend 
on film thickness’’. The focal spot upon propagation through the solid film and 
divergence of the harmonics were measured to be independent of incident inten- 
sity, falling within 10% of their nominal values. 

Calculations. The argon band structure and dipole transition elements are calcu- 
lated using DFT. The DFT calculations employ the linear augmented plane-wave 
method implemented in Wien2k”*. The calculations use a muffin-tin radius of 
3.0 Bohr, a k-point grid of 33 x 33 x 33 in the Brillouin zone, and a plane-wave 
energy cutoff of 50 atomic units. The DFT code uses the Perdew-Burke-Ernzeroff 
generalized gradient approximation (GGA) functional”*. Since DFT produces an 
energy gap that is too small, a modified Becke-Johnson correction” is applied to the 


conduction-bands energies to obtain better agreement with the experimental band 
structure. 

We next use the band structure and the transition matrix elements to solve 
the time-dependent Schrédinger equation (TDSE) in k-space, in which different 
k-values are decoupled. The harmonic spectrum is obtained as the Fourier trans- 
form of the time-dependent current calculated from the TDSE solution!®. Our 
initial condition is a delocalized Bloch wavefunction (only k= 0) located at the 
highest symmetric point, I’, on the valence band. For simplicity we include only 
the four lowest strongly coupled bands, meaning that we are solving the TDSE 
for a four-level system, in the Bloch basis!°. We have established that including 
additional higher bands makes only a negligible change to the harmonic spec- 
trum, whereas excluding any of the four lowest bands makes a large difference. 
We adjust the dipole transition matrix elements so that the couplings between 
levels 1 and 2, between levels 2 and 3, and between levels 3 and 4 are roughly 
equal (in atomic units the couplings are 41,2 = 0.62, fi2,3= 0.41, and j13,4=—0.61) 
and ignore all other couplings. We justify this adjustment on the grounds that the 
experiment averages over all orientations, and precise dipole-matrix elements for 
higher bands are difficult to extract from DFT. The four strongly coupled levels 
in our model represent the simplest possible description of the coupling between 
clusters of bands, which would be expected to occur in steps, with approximately 
equal coupling strengths. 

In the four-level picture, the multiple plateaus simply come from transitions 
between the instantaneous, field-dressed, eigenstates of the system. For a four-level 
system driven by a laser field F(t), the Hamiltonian can be written as 


Wy fF (t) 0 0 
= MyF(t) Ww. Ma3F(E) 0 

0 MyF(t) wW3 gg (t) 

0 0 [34F (t) Wy 


where w is the energy of the band i and ,1; is the dipole transition between levels 
iand j. The energies of the instantaneous eigenstates E; can be calculated by diago- 
nalizing the Hamiltonian. Then the cutoff energy for the first plateau will be given 
by the maximum energy difference between the field-dressed levels, (Ex — E1) max» 
which scales approximately linearly with the laser field strength'®. Likewise the 
cutoff energy for the second plateau will be (E3 — E,)max. This prediction agrees 
very well with the two harmonic cutoffs visible in Fig. 2b. 
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Negative capacitance in multidomain ferroelectric 


superlattices 


Pavlo Zubko!*, Jacek C. Wojdel**, Marios Hadjimichael’, Stéphanie Fernandez-Pena’, Anais Sené“, Igor Luk’ yanchuk*", 


Jean-Marc Triscone® & Jorge Iftiguez*® 


The stability of spontaneous electrical polarization in ferroelectrics 
is fundamental to many of their current applications, which range 
from the simple electric cigarette lighter to non-volatile random 
access memories!. Research on nanoscale ferroelectrics reveals 
that their behaviour is profoundly different from that in bulk 
ferroelectrics, which could lead to new phenomena with potential 
for future devices?~*. As ferroelectrics become thinner, maintaining 
a stable polarization becomes increasingly challenging. On the other 
hand, intentionally destabilizing this polarization can cause the 
effective electric permittivity of a ferroelectric to become negative’, 
enabling it to behave as a negative capacitance when integrated ina 
heterostructure. Negative capacitance has been proposed as a way of 
overcoming fundamental limitations on the power consumption of 
field-effect transistors®. However, experimental demonstrations of 
this phenomenon remain contentious’. The prevalent interpretations 
based on homogeneous polarization models are difficult to reconcile 
with the expected strong tendency for domain formation®’, but 
the effect of domains on negative capacitance has received little 
attention®!°-!?, Here we report negative capacitance in a model 
system of multidomain ferroelectric—dielectric superlattices across a 
wide range of temperatures, in both the ferroelectric and paraelectric 
phases. Using a phenomenological model, we show that domain- 
wall motion not only gives rise to negative permittivity, but can also 
enhance, rather than limit, its temperature range. Our first-principles- 
based atomistic simulations provide detailed microscopic insight into 
the origin of this phenomenon, identifying the dominant contribution 
of near-interface layers and paving the way for its future exploitation. 

Negative capacitance in ferroelectrics arises from the imperfect 
screening of the spontaneous polarization®!*'>*, Imperfect screening is 
intrinsic to semiconductor-ferroelectric, and even metal-ferroelectric, 
interfaces because of their finite effective screening lengths'>'®. 
Alternatively, imperfect screening can be engineered in a controlled 
manner by deliberately inserting a dielectric layer of relative permit- 
tivity eq between the ferroelectric and the electrodes® (Fig. 1a). The 
physical separation of the ferroelectric bound charge from the metallic 
screening charges creates a depolarizing field inside the ferroelectric, 
destabilizing the polarization and lowering the ferroelectric transition 
temperature Tc. The effect of the dielectric layer can be understood 
by considering the free energy of the bilayer capacitor with the usual 
assumption of a uniform polarization P (see Methods). Below the 
bulk transition temperature To, the free energy of the ferroelectric 
layer develops a double-well with minima at finite values of P, but, 
when combined with the parabolic potential of the dielectric layer, the 
total energy has a minimum at P=0 (Fig. 1b). The reciprocal dielectric 
constant of the system as a whole ¢~', given by the curvature of the 
total energy with respect to the polarization, is positive, as required for 
thermodynamic stability. However, because the non-polar state of the 


ferroelectric layer corresponds to a maximum of its local energy, the 
local stiffness of the ferroelectric layer is negative; that is, polarizing the 
ferroelectric layer has a negative energy cost. 
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Figure 1 | Phenomenological description of negative capacitance. 

a, Sketch of the ferroelectric-dielectric bilayer capacitor with (right) and 
without (left) domains. Green, blue and grey layers correspond to the 
dielectric, ferroelectric and metallic components, respectively. b, The total 
(pink) and local free energies F of the ferroelectric (blue) and dielectric 
(green) layers. c, Temperature T dependence of the local dielectric stiffness 
of the ferroelectric layer calculated from phenomenological models with 
uniform homogeneous polarization (blue), and with inhomogeneous 
polarization with static, soft domain walls (red, dashed) and mobile, 
abrupt (thin) domain walls (red, solid). The dotted line marks the 
breakdown of the thin-wall Landau-Kittel model close to T¢ (see 
Methods). P, polarization; TT and TS, temperature of ferroelectric 
transitions to homogeneous and inhomogeneous states, respectively; To, 
bulk transition temperature; €q and ¢,, dielectric constants of the dielectric 
and ferroelectric layers, respectively. 
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With decreasing temperature, the ferroelectric double-well progres- 
sively deepens and would dominate the total energy below a tempera- 
ture T, favouring a transition toa homogeneous ferroelectric state. 
The local dielectric stiffness of the ferroelectric layer would then 
increase and eventually become positive, as shown by the blue curve in 
Fig. 1c. This ‘homogeneous’ model has served as the basis for the inter- 
pretation of previous experimental studies of negative capacitance!”~*°. 
However, although attractively simple, it does not describe the true 
ground state of the system because, in general, the depolarization field 
that leads to the negative capacitance effect will also tend to favour a 
ranean an inhomogeneous, multidomain phase at a temperature 
re (Ths rT), as demonstrated by numerous experiments (see, for 
eaiele er: 8). This has consequences for the dielectric response and 
negative capacitance, as we show with the help of two phenomenological 
models. 

First, we use a Ginzburg-Landau approach (Methods) to obtain 
an analytic description of the phase transition into an inhomo- 
geneous state with a gradual (soft) polarization profile, typical of 
ultrathin films*'. This model allows us to obtain the lattice contri- 
bution to the dielectric response (that is, the response of a static 
domain structure; dashed curve in Fig. 1c). The appearance of the 
soft domain structure results in qualitative changes in the shape of the 
e '(T) curve, pushing its minimum below the actual transition tem- 
perature. However, the overall effect of a static domain structure is to 
reduce the temperature range of negative capacitance, as previously 
thought". 

Second, to investigate the contribution of domain-wall motion, we 
choose to work in the simpler Kittel approximation, which is valid 
for the abrupt (thin) domain walls typical of thicker films well below 
Tc (refs 5, 12, 22). The resulting dielectric response is shown by the 
solid red curve in Fig. 1c (for details of the calculation, see Methods). 
Remarkably, domain-wall motion contributes negatively to the overall 
dielectric stiffness*'®'?. Macroscopically, domain-wall displacements 
create a net polarization that leads to a depolarizing field, which dom- 
inates the total field in the ferroelectric, thus leading to negative capac- 
itance. Microscopically, the domain-wall displacements redistribute 
the interfacial stray fields resulting in a negative net contribution to 
the free energy and thus to the local dielectric constant. Although 
the thin-wall Kittel model does not capture the subtleties of the soft 
domain structure of ultrathin films, it highlights the importance of the 
contribution of domain walls to extending the temperature window of 
negative capacitance. 

To experimentally access the different temperature regimes of 
negative capacitance shown in Fig. 1c, we deposited several series of 
high-quality epitaxial superlattices consisting of n; ferroelectric and ng 
dielectric monolayers repeated N times, hereafter labelled (ng ng)n. For 
each superlattice series, n¢is fixed while ng is varied from four to ten 
unit cells. SrTiO; crystals were used as substrates and epitaxial SrRUO3 
top and bottom electrodes were deposited in situ to enable dielectric 
impedance spectroscopy measurements. SrTiO; was also chosen as the 
dielectric component, and PbTiO; and quasi-random Pbo 5SrosTiO3 
alloys were used as the ferroelectric layers. The Pbo sSro sTiO3 com- 
position was chosen for its low To, enabling us to investigate the full 
range of temperatures up to and above Ty without complications arising 
from leakage. 

Such superlattices constitute a model system for the observation of 
negative capacitance, because they are mathematically equivalent to the 
bilayer systems investigated theoretically>*”' and present a number of 
very convenient features—for example, the small layer thicknesses min- 
imize the number of free carriers, ensuring appropriate electrostatic 
boundary conditions, while the highly ordered stripe domains are well- 
suited for X-ray diffraction (XRD) studies and theoretical modelling. 
By varying the thicknesses of the dielectric layers and the total num- 
ber of bilayer repetitions, the permittivities of the individual layers can 
be extracted from measurements of the total capacitance of a series of 
samples. 
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The dielectric properties of three Pbo sSro.5TiO3-SrTiO3 superlattices 
with 14-unit-cell-thick Pbos5Sro.5TiO3 layers are summarized in 
Fig. 2a—d. All superlattices exhibit a broad maximum in the dielectric 
response that moves to lower temperature with increasing SrTiO; con- 
tent (Fig. 2a). These maxima do not coincide with the phase-transition 
temperature T and instead arise from the qualitatively different tem- 
perature dependences of the permittivities of the SrTiO; and 
Pbo sSro.sTiO3 layers. Using XRD, we obtain an estimate of Ts from the 
temperature evolution of in-plane and out-of-plane lattice parameters 
(aand c, respectively), as shown in Fig. 2b. Contrary to what is expected 
for a transition to a homogeneous ferroelectric state, the observed re 
is independent of the thickness of the SrTiO; layer because T’ is deter- 
mined by the domain-wall density, which in turn depends only on the 
thickness of the ferroelectric layer. The regular domain structure with 
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Figure 2 | Temperature dependence of the dielectric permittivities of 
Pbo.5Sro.5TiO3-SrTiO3 and PbTiO3-SrTiO; superlattices. a, Total 
dielectric constant € of (14, na) Pbo.sSro.sTiO3-SrTiO3 superlattices as a 
function of temperature T. Red, ng=4; blue, ng=7; green, ng= 10. 

b, Sample tetragonalities c/a used to determine Tc. c, Linear fits to the 
series capacitor expression (n/e © ng/eq + ng/ eg, Where n= ng+ np eg and ef, 
are the dielectric constants of the dielectric and ferroelectric layers, and ng 
and ny, are the numbers of dielectric and ferroelectric monolayers per 
period of the superlattice) for a selection of temperatures. d, Reciprocal 
dielectric constant of the Pbo 5Sro.sTiO3 layers in (14, na) superlattices 
calculated from the series capacitor model. Dashed line indicates our 
estimate of Ty obtained from the temperature dependence of c/a of a 

thin Pbo.sSro.sTiO3 film. Inset, dielectric constant of the SrTiO; layers. 

e, Reciprocal dielectric constant of the PbTiO; layers in (5, mg) superlattices. 
In d and e, the arrows indicate T® with associated error bars representing 
the spread of values between the samples in the series; grey shading 
indicates estimated uncertainties obtained from weighted-least-squares 
linear fitting with weights determined from inter-electrode variation at 
room temperature within each sample. 


23 JUNE 2016 | VOL 534 | NATURE | 525 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


a periodicity of about 10-12 nm can be observed using XRD as peaks 
in the diffuse scattering around the superlattice Bragg reflections 
(Extended Data Fig. 1). 

To separate the dielectric constants eg and e¢ of individual layers we 
apply the standard series capacitor expression, which for our super- 
lattices is n/e © ng/eg + ng/e¢ (see Methods), where n = ng+ ngand € is 
the overall dielectric constant of the superlattice, obtained directly from 
the measured capacitance. The linear relationship between n/e and ng 
is well satisfied for 100 K S$ T<570K, as illustrated in Fig. 2c for a few 
selected temperatures. The dielectric constant of the SrTiO; layers can 
be obtained from the slopes of the plots in Fig. 2c. The resulting €4(T) 
is presented in the inset of Fig. 2d and shows the typical decrease with 
temperature observed in SrTiO; thin films and bulk crystals. The inter- 
cepts of the linear plots in Fig. 2c give the reciprocal dielectric constant 
of the Pbo.sSro 5TiO; layers ees which is plotted in Fig. 2d. At low tem- 
perature, deep in the ferroelectric regime, ¢; 'is positive. However, upon 
heating it slowly decreases, entering the negative capacitance regime 
around room temperature. It then reaches a minimum, and sub- 
sequently returns to positive values at high temperature in the paraelec- 
tric phase. The minimum in ¢;' is observed well below the 
phase-transition temperature T? (indicated with an arrow in Fig. 2d), 
contrary to what would be expected for a structure with homogeneous 
polarization. For this series of samples, the temperature regime 
rT < T < Ty cannot be resolved because res is very close to Tp (meas- 
ured independently to be around 500 K for Pbo. sSrosTiOs thin films of 
the same composition). To access this temperature regime, a set of 
(5, na) superlattices was fabricated with Pbo 5SrosTiO3 replaced by 
PbTiO3, which has a much higher Ty 1,200 K when grown coherently 
on SrTiO3 (ref. 23). As shown in Fig. 2e, the negative capacitance 
se can be clearly observed in the paraelectric phase above 

c 580 Kin these samples. However, above T the dielectric stiffness 
increases much faster than expected, becoming positive far below To. 
This is probably due to the progressive increase in the thermally 
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Figure 3 | Results of Monte Carlo simulations of a first-principles-based 
model for the (8, 2) superlattice. a-c, Temperature T dependence of the 
tetragonality c/a (a), local polarization P (b) and reciprocal dielectric 
constant 1/e¢ (c) of the PbTiO; layer. Solid lines are guides to the eye. 

a, The dashed lines extrapolate the linear behaviour above and below 

the transition, and help us locate the elastic transition temperature. 

b, Supercell average of the absolute value of the local polarization 

(black circles), as well as the polarization at a particular cell within a 
domain considering its absolute (red squares) and bare (blue diamonds) 
values. The arrow in a marks the elastic transition and the onset of 
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activated conductivity of the ferroelectric layers, which destroys the 
electrostatic boundary conditions required for negative capacitance 
and leads to Maxwell-Wagner relaxations at high temperature 
(see Methods). 

To gain further insight, we used first-principles-based effective mod- 
els that permit the treatment of thermal effects. We used the potentials 
for PbTiO; and SrTiO; introduced in ref. 24 as the starting point to 
construct models for PbTiO3-SrTiO3; superlattices with an in-plane 
epitaxial constraint corresponding to a SrTiO3(001) substrate (see 
Methods). As compared with experiment, our models feature relatively 
stiff SrTiO; layers and relatively low ferroelectric transition tempera- 
tures; otherwise, they capture the behaviour of PbTiO; layers stacked 
with dielectric layers in a qualitatively and semi-quantitatively correct 
way. For computational feasibility, we focused on a representative (8, 2) 
superlattice (10 x 10 x 10 elemental perovskite cells in the periodically 
repeated simulation box) that presents the behaviour summarized in 
Fig. 3. As the superlattice is cooled from high temperature, the c/a ratio 
of the PbTiO; layers (Fig. 3a) evidences an elastic transition, at about 
490K, to a state characterized by strongly fluctuating ferroelectric 
domains (380-K snapshot provided in Fig. 3d and Supplementary 
Video 1). This fluctuating phase could be indicative of temperature- 
induced domain melting, analogous to vortex lattice melting in 
high-temperature superconductors”. As we further cool the 
superlattice, we observe a ferroelectric transition at 370 K associated 
with the freezing of the domains into stable stripes. This change can be 
appreciated in Fig. 3b, in which we plot different measures of the local 
dipole order inside the PbTiO; layer. As shown in the bottom panel of 
Fig. 3d, this low-temperature phase presents stripes along the [110] 
direction, with a domain thickness of about five unit cells and sharp 
walls. As shown in Fig. 3f and Extended Data Fig. 2, in the ground state, 
the dipoles form closure domains and almost do not penetrate into the 
stiff SrTiO3 layers. This corresponds to the vanishing of spontaneous 
polarization at the surface of a polydomain ferroelectric”®. The domain 
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fluctuating polar order around 490 K; those in b mark the ferroelectric 
freezing transition around 370 K as determined from the inflection points. 
The high-temperature tails of the black and red curves in b reveal the 
presence of incipient polar order. d, Snapshots of the local polarization 

P (out-of-plane component) within the middle of the PbTiO; layer at 

380 K (top) and 240 K (bottom). e, Temperature dependence of the local 
dielectric response 1/e; resolved along the stacking direction. f, Local 
susceptibility y; map in the (110) plane at 320 K. The small arrows 
represent the projection in the (110) plane of the local electric dipoles as 
deduced from the equilibrium atomic configuration. 
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walls exhibit substantial Bloch character in the ground state; this is the 
result of a wall-confined polarization along [110] that appears at about 
120 K and is analogous to the one recently predicted?’ for pure PbTiO3. 

We investigated the layer-resolved dielectric response of the super- 
lattices. In essence (more details in Methods), we compute the local 
susceptibility of a region i, X= ae in which € 9 is the vacuum 

€0 OLext 

permittivity, P; is the local polarization and (...) represents a thermal 


average that can be readily obtained by simulating our models under 
an applied electric field E.x:. As shown in Methods, the local dielectric 
constant can be expressed as €;= €tot/(€tot — Xi)» Where €tot is the dielec- 
tric constant of the whole system. The results in Fig. 3c correspond to 
such a calculation for the PbTiO; layers of the (8, 2) superlattice, and 
confirm the presence of a region of negative capacitance extending 
above and below the ferroelectric transition temperature. 

However, it is not immediately obvious where the computed neg- 
ative capacitance comes from. The local susceptibilities y; are always 
positive in our calculations, confirming the expectation that an applied 
external field induces polarization changes that are parallel to it. By con- 
trast, the local dielectric constant €; measures a response to a local field 
that incorporates depolarizing fields, making its behaviour richer and 
its physical interpretation more challenging”. In particular, ¢; will be 
negative if x; > €or. Hence, the negative capacitance regions are those 
that are substantially more responsive than the system as a whole. 

Our formalism allows us to map out the local response within the 
PbTiO layers and thus determine which regions are responsible for 
the negative capacitance behaviour. Figure 3e shows ¢; ' resolved along 
the superlattice-stacking direction and as a function of temperature. At 
high temperatures, the material behaves like a normal dielectric. Then, 
negative contributions to e;' appear at about 550K, well before any 
ordering occurs in the system; in that regime, the negative contribution 
is confined to the vicinity of the PbTiO3-SrTiO3 interface, and the 
response of the whole PbTiO; layer continues to be positive. As tem- 
perature is further reduced, the negative capacitance region extends to 
the whole PbTiO; layer. Eventually, at low temperature, the inner part 
of the PbTiO; layer recovers a conventional dielectric behaviour that 
dominates the total response, even if our simulations reveal that a 
negative contribution from the interfaces still persists. 

We can further map the susceptibility within the planes perpendic- 
ular to the stacking direction to quantify the contributions of domains 
and domain walls. Figure 3f shows representative results at 320 K. 
Predictably, we find that the susceptibility at the domain walls is much 
larger than at the domains. In other words, the field-induced polar- 
ization of the walls, which results in the growth or shrinkage of the 
domains, dominates the response. Further, the large response of the 
walls is much enhanced in the vicinity of the interfaces with the SrTiO3 
layers. Hence, our simulations suggest that, below about 370K, the 
domain-wall region near the interfaces dominates the negative capac- 
itance of the PbTiO; layers. 

There are important differences between our simulated and experi- 
mental superlattices that complicate a detailed comparison (see 
Methods for more details). Nevertheless, our basic result—that the 
PbTiO; layers have a negative dielectric constant in a temperature 
region extending above and below Tc—is confirmed by our simula- 
tions. Further, we also ran simulations of various (8, mg) superlattices 
to mimic our experimental approach for calculating the response of the 
PbTiO; layer; the results shown in Extended Data Fig. 3 are similar to 
those of Fig. 3c, thus validating our strategy to measure €¢. 

Finally, the depolarization effects in ferroelectric-dielectric super- 
lattices are completely analogous to those at interfaces between a fer- 
roelectric and a metal or a semiconductor. We have found that 
PbTiO3-SrRuO; superlattices, for example, exhibit very similar domain 
structures as PbTiO3-SrTiO3. These structures are induced by the 
imperfect screening at the SrRUO3-PbTiO; interfaces, which produces 
a depolarizing field equivalent to that induced by a 7-unit-cell-thick 
SrTiO; layer**”’. It is therefore reasonable to expect a negative 
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capacitance effect of the same order of magnitude in a transistor-like 
structure composed of a PbTiO; gate dielectric and an ultrathin con- 
ducting SrRuO3 channel, where applying a gate voltage V, will lead to 
an enhancement of the surface potential ¢, at the PhTiO;-SrRuO; 
interface. With a PbTiO3-SrRuO; interface capacitance C; of about 
0.6F m * (ref. 14) and a ferroelectric capacitance C; equivalent to that 
of one of our PbTiO; layers, one can obtain voltage amplification factors 


7 == a as large as about two at temperatures at which Ze is most 
‘g it Cr 


negative. For the more practical interface with a conventional semi- 


conductor, the expected amplification is more modest (for example, 


ao = 1.03 for C,0.1F m ~%, ref. 11), but is still enhanced compared to 


conventional gate dielectrics for which the corresponding value is less 
than unity. Such enhancements are especially encouraging in light of 
recent progress in the integration of ferroelectric oxides directly on 
conventional semiconductors*”. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Landau theory for monodomain bilayers and superlattices. To derive the 
expected temperature dependence of the dielectric function of a ferroelectric— 
dielectric bilayer or superlattice undergoing a phase transition to a homogenous 
(monodomain) state, we consider the free energy of the bilayer capacitor under 
short-circuit boundary conditions (or equivalently, one period of a superlattice) 
of the form 


Pai Shp? + Spt Se) + 1s (1) 


4 

The first term represents the energy density, per unit area, of a ferroelectric mate- 
rial with a second-order phase transition at a temperature Ty as determined by 
the coefficient of the P? term, as=(T — Ty)/(Ceéo). The second term describes the 
energy penalty for polarizing the dielectric layer with dielectric constant €g. Here 
Erand Eg are the electric fields appearing in the ferroelectric and dielectric layers, 
respectively, when the spontaneous polarization P develops, C is the Curie constant, 
Gris the coefficient of the P* term and |y,are the thicknesses of the dielectric and 
ferroelectric layers respectively. Taking into account the electrostatic boundary 
conditions at the ferroelectric—dielectric interface, €g¢9Ea= €oE¢+ P, and the short- 
circuit condition for the whole system, JaEa+ [E¢=0, the functional in equation (1) 
can be rewritten in terms of only P with a renormalized overall P coefficient 
and the corresponding lowering of the transition temperature. The transition to a 
homogeneous ferroelectric state is thus predicted to occur at a temperature 


= 
TR=Ty— iste 
d 


In particular, when €q > 1 and Iq is of the order of I, equation (1) reduces to 


Fle Cfp2+ Pps) 4 141 p2 
2 4 2€0€d 


which describes the energy of a homogeneously polarized bilayer with equal 
polarizations in both layers'””°. TP. then simplifies to 


tag 


Ths To — 
leq 


The overall electric susceptibility y of such a system is given by 


1 OF la | Ie 
= €0 2 T 

X OP" Xa Xe 
where = [;+ Ig, and yg ~ eq and y, = (ag + 38¢P?)-!ep ‘are the electric suscep- 
tibilities of the dielectric and ferroelectric layers, respectively. It has the familiar 
form of the series capacitance formula, 1/C,o,= 1/Ca+ 1/C;. For high permittivity 
materials such as those considered in this work, y= € is a very good approximation. 
The temperature dependence of the contribution to the reciprocal dielectric con- 
stant from the ferroelectric layer is shown in Fig. Ic (blue curve). Although the total 
permittivity (not shown) exhibits the typical divergence (ec !=0) at 7 and is 
always positive, as required for thermodynamic stability, the dielectric stiffness of 
the ferroelectric component decreases linearly with temperature upon cooling and 
acquires negative values below To. At T®, the spontaneous polarization appears 
and the 3(;P” term eventually restores € to positive values at lower temperatures. 
To obtain the blue curve in Fig. 1c, we modelled a 30-nm-thick PbTiO; film in 
series with a 10-nm-thick SrTiO; layer using the following pabametetst Ty =1,244K 
(strain-renormalized), C=4.1 x 10°K and ¢4=300, giving T?, = 788 K. 
Landau-Kittel model of domain-wall contribution to permittivity. For an 
isolated ferroelectric slab of thickness /; in zero applied field, the up- and down- 
oriented 180° domains are of equal width w, given by the Landau-Kittel square- 
root dependence*!”. For high-e ferroelectrics 
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where «| and €, are the ‘bulk lattice dielectric constants parallel and perpendicular 
to the polarization, respectively, £ is the coherence length, \= 1+ €q/(eye,)!/” and 
C 3.53 (refs 5, 12, 26, 33, 34). This equation also holds for ferroelectric films with 
‘dead layers’ and ferroelectric—dielectric superlattices, provided that the dielectric 
layers are thick enough compared to the domain width to allow the interfacial 
stray fields to decay sufficiently. Upon application of a field, the ferroelectric layer 
develops a net polarization due to the dielectric response of the lattice, described by 
€, and to the motion of domain walls. To calculate the domain-wall contribution, 
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one must find the field-induced changes to the stray depolarizing fields, as has been 
done in refs 5, 12, 22. The resulting effective dielectric constant of the ferroelectric 


can be expressed as!” 
Tv €e, ik de 
4In(2)\) ey w 


where the first term is the lattice response and the second term is the negative 
contribution from domain-wall motion. Within the limits of the validity of the 
Landau-Kittel theory, /;/w is large and therefore the second term is dominant. This 
term originates from the field-induced changes in the inhomogeneous electric-field 
distribution at the interface between the ferroelectric and the dielectric layers, 
consistent with the findings of our atomistic calculations. 

The temperature dependence of wand ¢,, can be estimated* using the standard 
critical Ginzburg-Landau expansions near Tp 


by 
(1- T/T)! 


ef El 
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where ki is related to the Curie constant C via ky = C/T and &) is the atomic-scale 
coherence length at T= 0. Assuming that €, is temperature independent, the 
domain width w is almost temperature independent*!*, whereas the approxi- 
mate temperature dependence of é¢ is sketched in Fig. 1c. The solid red curve in 
Fig. 1c was calculated for a 30-nm-thick film with the following parameters (cor- 
responding roughly to those of strained PbTiO3): Ty = 1,244 K, C=4.1 x 10°K, 
€, = 120 and 2&)=1nm. 

Ginzburg-Landau theory of polydomain bilayers and superlattices. The 
critical temperature of transition to the inhomogeneous striped domain state can 
be calculated within Ginzburg-Landau theory”**? 


Te=(1—47)T 


where T= (C/Tye.)"? x 2€p/Ip. For a 30-nm-thick PbTiO; film, we obtain 
Tie ~ 1,030 K. A similar expression (up to a numerical factor) can be obtained on 
the qualitative level by noting that, at T®, the domain width w becomes 
comparable with the domain-wall thickness 2€(T). 

Close to hee the Landau-Kittel thin-wall approximation breaks down as the 
domain profile becomes soft (we represent this region by the dotted line in Fig. 1c). 
The theory for mobile domain walls in this regime is challenging, but the lattice 
part of the response of the polydomain structure can be calculated analytically. 
This would correspond to a situation in which domain-wall motion is impeded, 
for instance, by pinning of the domain walls. Using Ginzburg-Landau theory, this 
contribution can be expressed as*° 


2€((T) 

3(P?) /P5—1 

which is a generalization of equation (2). Here, Pg = Po(1 — T/ To)? is the normal- 
ized temperature-dependent polarization of the bulk short-circuited sample and 
(P?) = (P*(x, y)) is the spatial average of the temperature- dependent polarization 
profile of the domain state with critical temperature Te The factor (P*) can be 
calculated over a wide temperature interval that includes both soft and abrupt 
(thin) domain profiles using the universal expression for P(x, y) in terms of elliptic 
sn functions, as given in equation (7) of ref. 21. After spatially averaging we 


e((T) = 


obtain 
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where F(x) = [K(m(x)) — E(m(x))]*/m? x tanh(0.357x), K(m) and E(m) are the 


complete elliptic integrals of the first and second kind, respectively, with 
m(x) & tanh(0.277x) and T=(C/Toe,)"? x 2€/1;. Note that F(0) =0 and that the 
above expression matches the relative permittivity of the paraelectric state 
6(T)=C/(T — To) atT = 7. The temperature dependence of €; , Gk is shown by 
the dashed red line in Fig. 1c. 

Sample preparation. Superlattices were deposited on monocrystalline SrTiO3(100) 
substrates using off-axis radiofrequency magnetron sputtering. PbTiO3 and 
SrTiO; were deposited at a substrate temperature of 520°C in an O>/Ar mix- 
ture of ratio 5/7 and total pressure of 180 mTorr. For SrRuQ; layers, acting as 
top and bottom electrodes, the corresponding parameters were 635°C, 1/20 and 
100 mTorr. Pbo sSro.sTiO3 layers were deposited by sequential sputtering of sub- 
monolayer amounts of SrTiO; and PbTiO3. The PbosSro.5TiO3-SrTiO3 superla- 
ttices were asymmetrically terminated with bottom SrRuO3-Pbo 5Sro.sTiO3 and 
top SrTiO3-SrRuO; interfaces. By contrast, the PbTiO3-SrTiO3 superlattices were 
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symmetrically terminated with both metal-insulator interfaces being between 
SrTiO; and SrRuO;; the thickness of interfacial SrTiO; layers was chosen to be half 
of those in the superlattice interior to maintain a constant overall composition. For 
each series of superlattices, the thickness of the ferroelectric layers was fixed (14 
unit cells for Pbo.sSro,sTiO3 and 5 unit cells for PbTiO3), whereas the thickness of 
the SrTiO; layers was varied from 4 unit cells to 10 unit cells. The number of repe- 
titions N was chosen to maintain the total superlattice thickness as close as possible 
to 100 nm for PbTiO3;-SrTiO; superlattices and 200 nm for Pbo.sSro 5TiO3-SrTiO3 
superlattices. To extract the interface capacitance contribution, a series of (5, 8) 
PbTiO3-SrTiO3 superlattices with N= 10, 19 and 30 was used. 

The top SrRuO; layers were patterned using ultraviolet photolithography and 

etched using an Ar ion beam to form a series of 240\1m x 240 1m capacitors. 
Structural characterization was performed using a PANalytical X’ Pert PRO dif- 
fractometer equipped with a triple axis detector and an Anton Paar domed heating 
stage. Dielectric impedance spectroscopy in the 100-Hz to 2-MHz frequency range 
was performed using an Agilent E4980A Precision LCR meter in a tube furnace 
with a custom-made sample holder under continuous O) flow at atmospheric 
pressure. 
Structural analysis. Specular 0-20 scans were used to determine the periodicity 
of the superlattice (Extended Data Fig. 1a), whereas rocking curves were used to 
confirm the presence of domains and determine their periodicity (Extended Data 
Fig. 1b, c). Temperature evolution of the lattice parameters was obtained from 
0-26 scans and used to determine the phase-transition temperatures, taken to be 
the crossing point of linear fits to the high and low temperature data (see Fig. 2b). 
Calculation of the permittivities of the individual layers. The total measured 
capacitance of the sample C;o: has contributions from the superlattice Cs, and the 
two metal-dielectric interfaces C; 


re a oe eee ree 
T 
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where Crand Cy are the total (series) capacitances of all the ferroelectric (PbTiO3) 
and dielectric (SrTiOs) layers, respectively. For an (ng na)n superlattice 
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where C; = C;/(€A), A is the sample area, dq¢ are the total thicknesses of the 
dielectric and ferroelectric components, respectively, d= d + drand cas are the 
lattice constants of the dielectric and ferroelectric layers, respectively. Because 
ce Cg t= (naca + nce) / (na + ne) 
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For a series of superlattices with a fixed ngand ng, but varying N, the interfacial 
contribution 1/(C;) can be obtained from the slope of a plot of n/e versus 1/N. 
Once the temperature dependence of the interfacial capacitance is known, the 
permittivities of the individual SrTiO; and PbTiO; layers can be obtained, using a 
series of samples with fixed ng and varying ng, from the slope and intercept of the 
plot ofn/e — 2/(NcC;) versus ng. 

This analysis relies on the assumption that the layer permittivities do not change 
as the thicknesses of the individual layers are varied within each superlattice series. 
It is therefore crucial that the thickness of the ferroelectric layers is held fixed, 
because it determines the periodicity of the ferroelectric domain structure and thus 
the ferroelectric transition temperature and the domain-wall contribution to the 
measured dielectric constant. All superlattices within a series must also be in the 
same regime of electrostatic coupling, which places a lower limit on the thickness 
of the SrTiO; layers of around 3-4 unit cells*®. 

To quantify the interface contribution C; for the PbTiO3-based superlattices, a 
series of symmetrically terminated samples with a fixed period (5, 8), but varying 
number of repetitions N, was fabricated. The interface capacitance was extracted 
from the intercept of the plot of n/e versus 1/N, as discussed above, and is shown 
as a function of temperature in Extended Data Fig. 4. At room temperature, Cj is 
around 1,000 fF jm’, which is in excellent agreement with previous experimen- 
tal work*” and compares quite well with the density functional theory prediction 
of 615 fF jum (at 0K) for the same interface**. The weak dependence of C; on 
temperature is also consistent with previous reports*’. Quantifying the interfacial 
contribution independently in this way allows us to extract the temperature range 
of the negative capacitance regime more reliably. As illustrated in Extended Data 
Fig. 4, the interfacial contribution does not change the qualitative behaviour of the 
extracted PbTiO; dielectric constant and makes only a small (within error bars) 


difference to the extracted PbTiO; stiffness. It is thus reasonable to neglect this 
correction, as was done in Fig. 2. 

Impedance analysis. The observation of negative capacitance relies on the elec- 
trostatic interactions between the ferroelectric and dielectric layers, which in turn 
require both materials to be sufficiently insulating to avoid the screening of the 
spontaneous polarization. To identify the origin of dielectric losses and to quantify 
the conductivity of our samples, we measured complex impedance spectra over a 
wide range of frequencies from 100 Hz to 2 MHz and performed equivalent-circuit 
modelling. We present the complex impedance Z(w) = Z’ + iZ” (in which w is the 
angular frequency, Z’ = Re(Z) and Z” = Im(Z)) data in the complex capacitance 
representation C(w) = C’ + iC” = 1/[iwZ(w)] (with C’ = Re(C) and C’=Im(C)) as 
is common for capacitive systems. 

Each PbTiO; and SrTiO; layer in the superlattice, as well as the two metal- 
dielectric interfaces, can be considered as a parallel R-C element, with a capaci- 
tance Cj and a resistance R; due to the finite conductivity of the layer. The superla- 
ttice is fhen modelled by connecting these R-C elements in series, as shown in the 
inset of Extended Data Fig. 5. An additional series resistance R, (typically a few 
hundred ohms) accounts for the contact resistances and other sources of resistance 
in the external circuit. 

At low temperature, the conductivities of the PbTiO; and SrTiO; layers are 

negligible and the whole system behaves as a single capacitance C = (dj Ch oie 
The measured C’ is frequency independent except for the high- frequency roll- off 
due to the parasitic series resistance R,. As shown in Extended Data Fig. 5 for a 
(5, 8)39 PbTiO3-SrTiO3; superlattice, even at 500K, the data can be well modelled 
by a single capacitor in series with R,; the parallel resistance is too high to be 
determined from the fit (that is, well above 10°). At higher temperatures, the 
superlattice conductivity increases resulting in an increase in the dielectric loss 
C” at low frequencies. The 600-K data are modelled with one parallel R-C element 
in series with R;. Despite the high temperature and large electrode area 
(240m x 240 1m), the total sample resistance is still 2 MQ. However, at 700K, the 
total sample resistance drops to 8.8 kQ. In addition, some layers become substan- 
tially more conducting than others, giving rise to Maxwell-Wagner relaxations”, 
which can be observed as steps and plateaus in C’(w). The behaviour can be qual- 
itatively captured by dividing the system into two blocks with different resistances, 
each modelled as a parallel R-C element. However, to reproduce the more gradual 
frequency dispersion, more R-C elements are needed (in this case three were 
sufficient). At these temperatures, the samples are too conducting to maintain the 
electrostatic conditions necessary for negative capacitance. The sample resistances 
for all data shown in Fig. 2 were higher than 1 MQ). 
Atomistic simulations of PbTiO3-SrTiO; superlattices. To construct the 
first-principles models for the PbTiO3-SrTiO3 superlattices, we took advantage of 
previously introduced” potentials for the bulk compounds, which give a qualita- 
tively correct description of the lattice-dynamical properties and structural phase 
transitions of both materials. Then, we treated the interface between PbTiO; and 
SrTiO; in an approximate way, relying on the following observations. (1) The 
inter-atomic force constants in perovskite oxides such as PbTiO; and SrTiO; have 
been shown to depend strongly on the identity of the involved chemical species 
and weakly on the chemical environment”’. (Hence, for example, the interactions 
between Ti and O are very similar in both PbTiO; and SrTiO3.) (2) Except in 
the limit of very-short-period superlattices, the main effects of the stacking are 
purely electrostatic and largely independent of the details of the interactions at 
the interfaces. (3) The main purely interfacial effects leading, for example, to the 
occurrence of new orders (such as those discussed in ref. 41) are related to the 
symmetry breaking, which permits new couplings that are forbidden by symmetry 
in the bulk case. Such qualitative symmetry-breaking effects are trivially captured 
by our potentials, even if the actual values of the interactions are approximate. 
(Similar approaches to treat ferroelectric superlattices and junctions can be found 
in the literature, ref. 42 being a representative case.) 

As a result of these approximations, we were able to construct our superlattice 
potentials by using the models for bulk PbTiO3 and SrTiO; to describe the inter- 
actions within the layers, assuming a simple numerical average for the interactions 
of the ion pairs touching or crossing the interface. For example, Ti-O interac- 
tions in a TiO, interface plane are computed as the average of the analogous Ti-O 
interactions in PbTiO; and SrTiO3. New interactions, such as those involving Pb 
and Sr neighbours across the interface, are chosen so that the acoustic sum rules 
are respected; in practice, their values are close to an average between the anal- 
ogous Sr-Sr and Pb-Pb pairs. Finally, the long-range dipole-dipole interactions 
are governed by a bare electronic dielectric constant ¢,. that is taken as a weighted 
average of the first-principles results for bulk PbTiO; (8.5¢o) and SrTiO3 (6.2¢0), 
with weights reflecting the composition of the superlattice. 

The parameters of our models for bulk PbTiO; and SrTiO3 were computed from 
first principles as described in ref. 24. To model our PbTiO3-SrTiO; superlattices, 
we adjusted our models in the following ways. (1) We softened the model for 
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bulk SrTiO; so that it has a dielectric permittivity ¢33 of about 300¢ at room 
temperature. We checked a posteriori that the SrTiO; layers in the superlattices are 
not as soft, which is probably a consequence of the modified electrostatic interac- 
tions (€,,) assumed, as described above. (2) We imposed an epitaxial constraint 
corresponding to having a SrTiO; (001)-oriented substrate; that is, we assume 
in-plane lattice constants a= b = 3.901 A, forming an angle y= 90°. (3) We tweaked 
the model for PbTiO; so that it gives an out-of-plane polarization of 1.0C m~? 
at 0K when subject to the epitaxial constraint just described. Care was needed 
because the model of ref. 24 for bulk PbTiO; becomes unstable when the epitaxial 
constraint is used in combination with the change in ¢,,. Nevertheless, it was possi- 
ble to obtain a stable model with the correct ground-state polarization by adjusting 
the expansive hydrostatic pressure introduced in ref. 24 as an empirical correction: 
instead of the —13.9 GPa used in ref. 24, here we used —11.2 GPa. Also, when we 
use this model to simulate a film of PbTiO; under the SrTiO; epitaxial constraint, 
we get a ferroelectric transition temperature of 460 K, which is slightly below the 
temperature at which the fluctuating domains appear in the (8, 2) superlattice 
(490K). As in the case of SrTiO3, the difference between bulk material and super- 
lattice is probably caused by the different value of ¢..: we use a slightly larger value 
for the pure film, which results in a weaker ferroelectric instability. 

These approximations and adjustments allow us to construct models for 
superlattices of arbitrary (m,, ng) stacking. For the simulations, we used peri- 
odically repeated supercells that contain 10 x 10 elemental perovskite units 
in-plane, whereas out-of-plane they expand one full superlattice period. Thus, 
for example, for the (8, 2) superlattice, we used a simulation box that contains 
10 x 10 x (8 +2) x 5=5,000 atoms. We solved the models by running Monte Carlo 
simulations comprising between 10,000 and 40,000 thermalization sweeps (longer 
thermalization is needed in the vicinity of phase transitions) followed by 50,000 
sweeps to compute thermal averages. The dielectric susceptibility was calculated 
by applying a small out-of-plane electric field to the simulation box. We found 
that, in this highly reactive system, this approach converged much faster than the 
usual fluctuation formulas*®. 

The low-temperature ground state of our (8, 2) superlattice is sketched in 

Extended Data Fig. 2, in which the stripe domain structure can be nicely appreci- 
ated. This result closely resembles the one obtained directly from first-principles 
calculations“ in the limit of 0 K; this agreement further confirms the accuracy of 
our model potential. 
Calculation of local dielectric constants. In the following, we summarize the 
derivation of formulas that relate the local response of each layer with the global 
response of the superlattice. Here we are exclusively concerned with the response 
along the superlattice stacking direction. We use a ‘0’ superscript to refer to the 
situation in which no external electric field is applied, and i to label the layers in 
the superlattice. In absence of free charges, the condition on the continuity of the 
electric displacement implies 


0 0 0 
D? = D° =P? + eoE; 


for all layers. Ey is the total electric field acting on layer i. In general, this total field 
can be split into local and external contributions, so that E;= Ejjoc + Eext. Naturally, 


when no external field is applied, we simply have EY = Eb oc: 


Additionally, if we have M layers in the repeated unit of the superlattice, the 
periodicity of the potential implies that 


M 
SLE; =0 
i=1 


in which J; is the thickness of layer i. Hence, in the absence of an applied field, 
there is no net potential drop across the supercell. Then, we immediately get that 


M M 
D9 =L1S7 1D? =L1Y7 IP) =P? 
J=1 i=1 


in which L = ©]; is one superlattice period and P? is the polarization of the 
superlattice with no field applied. As a result, the electric field at layer i is 


0_ 70 0 
Ej = Ejntoc = (P°— P;)/e0 


Now we consider an external electric field E.xt. It is trivial to verify that the 
field-induced variations in polarization, electric field and displacement satisfy 


AD; =AD= AP; + €0(AEi,toc + Ext) =AP+ €€0Eext 


M 
> l; AE ioc =0 


i=1 
and 


AE ioc => (AP = AP;)/€o 


LETTER 


Then, the dielectric constant of layer i is computed as 


AD 


— (AP ah €0Eext) 
eEogAE; 


= (xt) _ tot 
AP — AP; + €Eext 


X-XMAL tot — Xj 
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where we have introduced the layer susceptibility 


1 AP; 
€0 Eext 


Xi= 
As a result, we have written all the relevant quantities in terms of the local suscep- 
tibilities \;, which is convenient at conceptual and practical levels. Conceptually, 
xi is a quantity we expect to be positive in all cases, because an applied electric 
field will create dipoles parallel to it. This basic local response of the material 
is physically and intuitively clear, because it is free from the subtleties (associ- 
ated to the long-range electrostatic effects encapsulated in the local depolarizing 
fields) that affect the dielectric constant. Practically, ; is very easy to compute 
from a Monte Carlo simulation, whether by explicitly applying an electric field 
and calculating the change in local polarization or by directly inspecting the fluc- 
tuations of the local polarizations in absence of applied field. The latter approach 
can be viewed as a generalization of the method described in, for example, ref. 43; 
similar fluctuation formulas for ferroelectric nanostructures were introduced in 
refs 13, 45. 

The layers labelled by i will typically correspond to actual PbTiO; and SrTiO; 
layers, but we could also further sub-divide our superlattice. For example, the 
above formulas formally allow us to consider contributions from the interfaces, or 
from different regions within a layer. This kind of subdivision was used to prepare 
Fig. 3e, in which maps of the dielectric constant as a function of position along the 
superlattice stacking direction are reported. 

The dielectric susceptibility .; of a layer i can be viewed as a direct average of 
the susceptibilities .;(x, y) coming from different regions of the x-y plane of the 
layer. Hence, we can use a representation as in Fig. 3f to determine which part of 
a given layer (domain walls or domains) contributes the most to x;. It could be 
tempting to interpret the layer dielectric constant €; as coming from a collection of 
parallel capacitors, which would formally allow us to map €;(x, y). However, such 
a construction implicitly assumes an equal potential drop across the individual 
capacitors within layer i, which seems in conflict with the inhomogeneous in-plane 
structure of our PbTiO; layers. 

For the calculation of local polarizations, we evaluated the local dipole and 

cell volume from the atomic positions and Born effective charges. We computed 
dipoles centred on the A (Pb/Sr) and B (Ti) sites of the perovskite structure, by 
considering the weighted contributions of the surrounding atoms. Thus, for exam- 
ple, the dipole centred on a specific Ti cation was computed by adding up contri- 
butions from the Ti itself, the six neighbouring oxygens (each such contribution 
was divided by two, because each oxygen has two first-neighbour Ti cations) and 
the eight neighbouring A (Pb/Sr) cations (each such contribution was divided by 
eight, as each A cation has eight first-neighbour Ti cations). 
Relation between atomistic simulations and experiment. As already mentioned, 
our model potentials for PbTiO3-SrTiO3 superlattices are not expected to render 
quantitatively accurate results. The difficulties in reproducing the behaviour of 
the bulk compounds in a quantitative way are discussed in refs 24, 46, in which 
evidence is given of the challenge these materials pose to first-principles methods. 
The model deficiencies are best captured by the error in the obtained transition 
temperatures: the model for PbTiO; used here gives a value of 440 K when solved 
in bulk-like conditions, far below the experimental result of 760 K. Similarly, we do 
not expect our models to accurately capture the dielectric response of the SrTiO3 
layers in the superlattice, which tend to be stiffer than the experimental ones. 
Fortunately, beyond these quantitative inaccuracies, the qualitative behaviour of 
individual PbTiO; and SrTiO; obtained from our simulations, as well as that of the 
PbTiO3-SrTiO; superlattice, seem perfectly in line with experimental observations 
and physical soundness. 

With regard to the results for the PbTiO3-SrTiO3 system, our atomistic simula- 
tions predict a phase transition occurring in two steps: at a relatively high temper- 
ature (490 K), the c/a ratio of the PbTiO; layers clearly reflects the onset of local 
instantaneous ferroelectric order; then, at a lower temperature (about 370 K), the 
static multidomain ferroelectric state freezes in. Thus, according to these simula- 
tions, the interval between 370 K and 490 K is characterized by strongly fluctuating 
ferroelectric domains. This result is likely to be affected by finite-size effects in our 
simulations; yet, given the very large separation of the two transition temperatures, 
and the easy and frequent domain rearrangements observed in our Monte Carlo 
simulations, we believe it should be taken seriously. Experimentally, preliminary 
measurements of the (5, 4)23 P»TiO3-SrTiO3 superlattice (Extended Data Fig. 6) 
indicate that the kink in the measured temperature dependence of the c/a ratio 
(usually assumed to mark the ferroelectric transition) occurs at a slightly higher 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


(about 50 K) temperature than the appearance of domain satellites in the diffuse 
scattering. A similar temperature difference has been noted in figure 3 of ref. 47 for 
a superlattice of a different composition. However, further investigation is needed 
to clarify the polarization structure in this temperature range. 

We also ran simulations of various (8, ng) superlattices, and computed the cor- 
responding total dielectric constants as a function of temperature, to mimic our 
experimental approach to estimate the response of the PbTiO; layer. Extended Data 
Fig. 3 shows the results for ¢¢ obtained in this way: we find a temperature interval 
in which the PbTiO; layers present a negative dielectric constant, which validates 
our experimental strategy for estimating €;. 

If we compare the results in Extended Data Fig. 3 and Fig. 3c, then we note that 
the temperature interval in which the negative capacitance is observed is essentially 
the same, but the quantitative values for e; clearly differ. Nevertheless, given the 
approximations involved in each of the two methods to compute e;—for example, 
heuristic division into PbTiO3 and SrTiO; layers and the implicit assumption 
that PbTiO; layers in (8, mg) superlattices of varying SrTiO; content behave 
equivalently—these quantitative discrepancies do not seem very substantial and 
we have not investigated them further. 

Finally, returning to our phenomenological predictions in Fig. 1c, it would 
appear that the Landau-Ginzburg result for the static domain structure bears 
a closer qualitative resemblance to the experimental data and the atomistic 
simulation results than does the Kittel model, despite the fact that it is the Kittel 
model that correctly captures the important contribution of domain-wall motion. 
However, in the Kittel model, we have not included the possibility of domain-wall 
pinning (by defects or otherwise), which would reduce the domain-wall contribu- 
tion and could lead to an upturn of ¢; at low temperatures. Quantifying the relative 
contributions to negative capacitance from domain-wall motion and the static 
domain response, both experimentally and through atomistic simulations, would 
be a worthwhile challenge for future studies. 
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Extended Data Figure 1 | XRD characterization of the superlattices. obtained from fitting the Q, line profiles using a sum of two Gaussian 
a, Intensity profiles around the (002) substrate reflection for functions for the domain satellites and a Lorentzian function for central 
Pbos5Sro.sTiO3-SrTiO3 (PST-STO) superlattices. The broad peaks Bragg peak. The error bars were determined from the 95% confidence 
around 20 = 45.5° correspond to the top and bottom SrRuO; electrodes. bounds for the peak positions obtained from the fits. ng is the number of 
Finite-size oscillations due to the 200-nm superlattice thickness are SrTiO; layers in the (14, ma) (b) or (5, ma) (c) superlattices; Q, is the 
visible. b, c, XRD domain satellites for Pbo.5Sro,s5TiO3-SrTiO3; (b) and in-plane reciprocal-space coordinate. 


PbTiO3-SrTiO; (PTO-STO; c) superlattices. Insets, domain periodicities 
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Extended Data Figure 2 | Local polarization distribution at low 
temperature. Arrows indicate the dipole component within the (110) 
plane; we plot arrows for Pb/Sr-centred and Ti-centred dipoles. The 
colouring indicates the polarization P component along [110], revealing a 
low-temperature polar order at the domain walls. PTO, PbTiO;; 

STO, SrTiO3. 
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Extended Data Figure 3 | Comparison with experiment. Reciprocal dielectric constant 1/e¢ of the PbTiO; layers as a function of temperature T, 
calculated from the computed total dielectric constants of (8, ng) superlattices using the same analysis as for the experimental data. 


1/e, 


23 JUNE 2016 | VOL 534 | NATURE | 535 
© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


a 2.0 

1.5 
ie 
£ 10 
I 
Oo 

0.5 

0.0 

300 400 500 


T (K) 


Extended Data Figure 4 | Interface capacitance contributions. a, SrRuO3-SrTiO; interface contribution C; to the dielectric response. b, Dielectric 
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estimated uncertainties obtained from weighted-least-squares linear fits. 
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Extended Data Figure 5 | Dielectric impedance spectroscopy of 
PbTiO3-SrTiO; superlattices. Real (C’; filled circles) and imaginary 
(C”; open circles) parts of the complex capacitance function C= C’ + iC” 
for a (5, 8)39 PbTiO3-SrTiO3; superlattice. For temperatures below about 
650K, the data are well fitted by a single parallel R-C element in series 
with R,, as shown by solid curves for the 500 K (blue) and 600 K (orange) 
data. At higher temperatures, Maxwell-Wagner relaxations appear as the 


conductivities of some layers increase faster with temperature than others. 
At 700K (red), the response is qualitatively captured by a model with 

two parallel R-C elements in series with each other (dashed red curve), 
whereas for a quantitative fit three R-C elements are required (solid 

red curve). The inset shows the arrangement of elements in the generalized 
equivalent circuit used to fit the data. 
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Extended Data Figure 6 | Temperature evolution of the tetragonality integrating the measured intensity of the domain satellites and subtracting 
and domain satellites. Intensity of the XRD domain satellite (filled red the minimum integrated intensity in the paraelectric phase. Vertical 
circles) and the film tetragonality (c/a; open blue squares) for a (5, 4)28 blue line marks the temperature at which linear fits to the low- and 
PbTiO3-SrTiO3; superlattice. The satellite intensity was obtained by high-temperature data (blue lines) intersect. 
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Controlled fragmentation of multimaterial fibres 
and films via polymer cold-drawing 


Soroush Shabahang!, Guangming Tao!, Joshua J. Kaufman!, Yangyang Qiao’, Lei Wei*, Thomas Bouchenot?, Ali P. Gordon?, 


Yoel Fink‘, Yuanli Bai*, Robert S. Hoy® & Ayman F. Abouraddy!® 


Polymer cold-drawing!* is a process in which tensile stress reduces 
the diameter of a drawn fibre (or thickness of a drawn film) and 
orients the polymeric chains. Cold-drawing has long been used 
in industrial applications’, including the production of flexible 
fibres with high tensile strength such as polyester and nylon®”. 
However, cold-drawing of a composite structure has been less 
studied. Here we show that in a multimaterial fibre!!! composed 
of a brittle core embedded in a ductile polymer cladding, cold- 
drawing results in a surprising phenomenon: controllable and 
sequential fragmentation of the core to produce uniformly sized 
rods along metres of fibre, rather than the expected random 
or chaotic fragmentation. These embedded structures arise 
from mechanical-geometric instabilities associated with ‘neck 
propagation”’. Embedded, structured multimaterial threads with 
complex transverse geometry are thus fragmented into a periodic 
train of rods held stationary in the polymer cladding. These rods 
can then be easily extracted via selective dissolution of the cladding, 
or can self-heal by thermal restoration to re-form the brittle thread. 
Our method is also applicable to composites with flat rather than 
cylindrical geometries, in which case cold-drawing leads to the 
break-up of an embedded or coated brittle film into narrow parallel 
strips that are aligned normally to the drawing axis. A range of 
materials was explored to establish the universality of this effect, 
including silicon, germanium, gold, glasses, silk, polystyrene, 
biodegradable polymers and ice. We observe, and verify through 
nonlinear finite-element simulations, a linear relationship between 
the smallest transverse scale and the longitudinal break-up period. 
These results may lead to the development of dynamical and 
thermoreversible camouflaging via a nanoscale Venetian-blind 
effect, and the fabrication of large-area structured surfaces that 
facilitate high-sensitivity bio-detection. 

When a longitudinal tensile stress is applied to a ductile polymer 
fibre or sheet, the mechanical instability known as ‘necking’ reduces the 
transverse dimensions of the sample and longitudinally orients the con- 
stituent chains”, The necked region is initially localized, but expands 
uniformly via propagation of the ‘shoulder’ into undeformed regions of 
the polymer until the neck is fully developed and extends throughout 
the sample length (Fig. 1). The polymer is consequently left in a new, 
anisotropic phase with potentially superior mechanical and/or optical 
properties». For example, in the original observation of cold-drawing', 
a highly oriented, transparent and robust neck developed in an ini- 
tially opaque and brittle polyester fibre. Commercial applications that 
exploited the large tensile strength, low weight and high flexibility of 
cold-drawn synthetic polymer fibres soon followed*~’. After decades 
of experimental and theoretical research!~?, cold-drawing is now 
reasonably well-understood at a phenomenological level. 

To date, investigations of cold-drawing have focused on the changes 
that occur in monolithic structures (composed of a single material) 


upon propagation of the shoulder, as well as further deformation 
leading to material failure’. Studies of the cold-drawing of polymer 
composites have primarily concentrated on improving the mechanical 
properties of bulk materials via fibre reinforcement!*"4. Conversely, 
the study of fibre-reinforced composites has concentrated on fracture 
phenomena driven by stress-transfer in a matrix'*-'’, Here we report 
a new dynamical phenomenon that exploits cold-drawing in multima- 
terial composite structures—in the form-factor of a cylindrical fibre 
or a flat film (Fig. lb-d)—and combines disparate materials, only 
one of which (a thermoplastic polymer) is amenable to cold-drawing. 
The mechanical-geometric mismatch between the ductile polymer 
(which undergoes cold-drawing) and the other, relatively brittle mate- 
rial (which does not) is harnessed as a scalable, mechanical pathway 
to produce a wide variety of complex multi-component nano- and 
microstructures with arbitrary cross-sectional geometries through 
the controllable local fragmentation of the brittle materials within the 
axially propagating polymer shoulders. 

We start by describing our observations in the context of the cylin- 
drical multimaterial fibre shown in Fig. le, f. The fibre consists of a 
20-|1m-diameter glass core (the inorganic chalcogenide glass As,Se3) 
embedded in a 1-mm-diameter polymer cladding (the thermoplastic 
polymer polyethersulfone, PES)!°". Fibres can be tens of metres in 
length; see Supplementary Information for fibre fabrication. The use 
of PES is not required; we have reproduced our results in fibres made 
of other thermoplastic polymers, including polycarbonate, polyether- 
imide, polysulfone and cyclic olefin polymer. At room temperature, the 
core is brittle whereas the polymer is ductile. Ata homogeneous uniax- 
ial extension ofa few per cent, necks form locally and extend until the 
fibre is fully drawn. Videos capturing the neck expansion and shoulder 
propagation along the fibre in real time reveal a surprising dynamical 
phenomenon that takes place inside the polymer shoulders during 
their propagation along the fibre (Supplementary Video 1). Although 
the core is initially intact along the fibre axis, within the propagating 
shoulder the glassy core fragments in an orderly sequence (upon pas- 
sage of the shoulder) into a periodic train of cylindrical rods that are 
held stationary in the fibre and separated by voids. As the shoulders 
propagate further, they continue to fragment the core in situ until 
they consume the whole length of the fibre, or until the applied stress 
is removed. We harvest the glass micro-rods by selectively dissolving 
the polymer with an organic solvent (dimethylacetamide, DMAC; 
Fig. 1g-j); the rough faceted surfaces confirm that the rods result from 
brittle fracture (Fig. 1j, inset). 

This phenomenon is reminiscent of shear-lag fracture (SLF) in 
fibre-reinforced composites with low interfacial strength'*"!”. SLF is 
a quite general phenomenon occurring in composites of mechani- 
cally incompatible materials. Previous work has generally sought to 
suppress SLF as a means of increasing the ductility of fibres, whereas 
here we have induced a controlled form of SLF at the neck front. 
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Figure 1 | Fragmentation via a cold-drawing-induced, propagating, 
mechanical-geometric instability. a, Photograph of a polymer fibre 
undergoing cold-drawing under axial stress at a speed of approximately 
5mms_’. Multiple shots taken over 1 min are overlaid to highlight the 
extent of fibre elongation. b-d, Scanning electron microscope (SEM) 
micrographs of the propagating shoulder in polymer (PES) fibres with 
cross-sections that are circular (diameter of 0.7 mm; b), rectangular (side 
lengths of 0.2 mm and 1 mm; c) and equilaterally triangular (side length 
of 0.4mm; d). Scale bars, 400 1m. e, Transmission optical micrographs 
of a multimaterial cylindrical fibre undergoing cold-drawing at 3mm s~ 
captured at three different stages: (i) initially intact fibre; (ii) neck 
formation; and (iii) shoulder propagation, leaving behind a fractured core 
after fragmentation. The cladding is a polymer ‘P’ (PES) and the core is 

a glass ‘“G (As>Se;3). f, A magnified transmission micrograph of the neck 
region, corresponding to the dashed black rectangle in e. The dashed white 
rectangle highlights the propagating instability, wherein fragmentation 
takes place. g, Schematic of selective dissolution of the polymer cladding 
to retrieve intact cores. h, SEM micrograph of retrieved intact glass 

cores from multiple fibres. i, j, Schematic (i) and SEM micrograph (j) of 
retrieved nano-fragmented micro-rods by selective dissolution from a 
cold-drawn fibre. Inset in j is an SEM micrograph of a single micro-rod. 


1 


Although a ‘local’ brittle fracture takes place, as in traditional SLF 
studies, the global dynamics are different; see Extended Data Fig. 1. 
We have found that this phenomenon applies to many core materials. 
These materials include dense solids, such as crystalline semiconduc- 
tors (Si and Ge)!8, and inorganic glasses (such as silicates, phosphates, 
chalcogenides and tellurites)'*. Relatively brittle polymers that do not 
typically undergo cold-drawing themselves can also be fragmented 
in this manner—including polystyrene, the biodegradable polymer 
polyethylene oxide, and even natural polymers such as silk and human 
hair. Indeed, even ice inside a hollow-core polymer fibre undergoes 
a similar break-up process before melting (Supplementary Fig. S6). 
Systematic investigation leads to several non-intuitive findings. 
First, the average length L of the glass rods is linearly proportional to 
the core diameter D, as shown in Fig. 2a. Second, the linear propor- 
tionality L / D=f depends on the core material, but not on the details 
of the experimental procedure (such as drawing speed, pre-stress of 
the polymer cladding or fibre outer diameter). Specifically, although 
the values of the strain at which necking is initiated and the final draw- 
ing ratio both depend on the fabrication conditions of the fibre, such 
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as the tension under which it is thermally drawn from the melt (Fig. 2c), 
they do not affect f. Note that the linear L/D proportionality is ‘cut off’ 
at an upper limit set by the length of the shoulder region Lghoutder: it 
requires that L, D < Lshoutders Dataddings Where Dgiadding is the diameter 
of the fibre cladding. Third, by determining the ratio f for a wide range 
of materials (Fig. 2b and Extended Data Table 1), we find empirically 
that fo ~/E, where E is the Young’s modulus of the core; for example, 
As2Se3 cores always have f~ 6, whereas tellurite glass cores have f3.9 
(Fig. 2a). Analytic treatments of SLF vary in sophistication from Cox’s 
original heuristic treatment” to Nairn’s rigorous approach", but agree 
that L [/D= /E /§2, consistent with Fig. 2b. Here (2 is a characteristic 
stress or energy density; for example, {2 is the Young’s modulus of the 
cladding in Cox’s theory, whereas later theories yield more complicated 
expressions for (2. The data we show in Fig. 2b indicate that 
§2~ 0.1 GPa in our experiments, and the data in Fig. 2c reveal that 
the true stress within the neck is also approximately 0.1 GPa (our key 
stress scale). 

These results are non-intuitive because stress distributions in the 
region in which the shoulder meets un-drawn material are known to 
be complex”!, and the presence of the core introduces additional strain 
localization due to the mechanical incompatibility between the core 
and cladding. Some essential features may be captured using a simple 
Considére model for core fracture in the shoulder region (Fig. 2d). 
The fragmentation of glass cores may be readily understood as arising 
from a mechanical-geometric instability. The Considére criterion for 
necking of solid materials states that undeformed and necked regions 
coexist. Within the shoulder, complex gradients of these quantities 
are present, but the brittle fracture of glass cores occurs along the 
‘coexistence’ line shown in Fig. 2d. The specific location along this 
line at which fracture occurs is material-dependent and sets the ratio f 

Given the above-mentioned complexity of the stress distributions, 
and to better understand the experimental findings, we perform non- 
linear finite-element simulations of the cold-drawing process for a 
typical core/cladding (As2Se3/PES) cylindrical geometry; see Methods. 
Axisymmetric elements are used and the measured mechanical prop- 
erties (including elastic modulus, plastic hardening and fracture) of 
the core and cladding are input to the computational model (Extended 
Data Fig. 2). The same tensile boundary conditions used in the experi- 
ments are applied. The simulation results are illustrated in Fig. 2e with 
the contour plots of von Mises stress distributions as a function of 
stretch. In step (ii), necking and core-fracture initiate, with the latter 
associated with the onset of stress concentration. Necking continues 
and the shoulders propagate in steps (iii) and (iv), resulting in sequential 
fragmentation of the core with a length-to-diameter ratio f of the rods 
that agrees with the experiments (Extended Data Table 2). As the 
rods pass into the necked region, the increased stretch expands the 
voids separating the fragments. The final fully drawn configuration 
is shown in step (v); see Supplementary Video 2 for a video of the 
simulation, and Extended Data Fig. 1 for a comparison with the more 
typical stress-transfer fragmentation in fibre-reinforced composites. 

As we have demonstrated through cold-drawing of fibres with rec- 
tangular and triangular cross-sections (Fig. 1c, d), the phenomena 
associated with cold-drawing in multimaterial fibres are not restricted 
to cylindrical geometries. In the flat-fibre geometry, new in-fibre frag- 
mentation phenomena are observed. In particular, when a thin brit- 
tle film is embedded in such a fibre (Fig. 3a, b), propagation of the 
rectangular shoulder upon necking leads progressively to fragmen- 
tation of the film into well-ordered strips (Supplementary Video 3). 
Comparable fracture was reported in ref. 22, in which deformation 
of metal films on polymer substrates was examined and strategies for 
maximizing the ductility of these composites were explored. As seen 
in Fig. 3c, the straight, sharp-edged strips resulting from a glass film 
fragmenting extend across the width of the fibre (about 1 mm) and are 
separated by rectangular voids in a well-ordered array extending along 
the whole fibre length. The optical properties of the flat fibre change 
markedly as a result of cold-drawing. The undrawn fibre contains 
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Figure 2 | Characterization of fragmentation induced by cold-drawing of 
a PES fibre. a, Measurements of the average length L of fragmented micro- 
and nano-rods of chalcogenide glass (ChG; As2Seo; red circles) and tellurite 
glass (TeG; 70TeO2-20ZnO-5K,0-5Na,0; blue squares) of diameter D in a 
PES fibre upon cold-drawing. The red solid and blue dashed lines are linear 
fits with slopes f= L/D = 6 and 3.9 for ChG and TeG, respectively. Vertical 
error bars represent the root-mean-squared (r.m.s.) length dispersion of 
rods at each value of D (Supplementary Information). Insets are SEM 
micrographs of individual rods resulting from the cold-drawing-driven 
fragmentation of As Se; cores of diameters (from left to right) 200 nm, 1 ym 
and 10 1m. b, Measured values of f for a host of materials embedded in a PES 
fibre plotted against their Young’s modulus E; see Extended Data Table 1. 
The dashed line corresponds to the ansatz fx ./E/Q ,, with 2=0.1 GPa 
(such that f= ./10E when E is in gigapascals). Vertical error bars represent 
the measured r.m.s. dispersion in f; horizontal error bars correspond to the 
uncertainty in the measured E (those for TeG and ice”? reflect the range of 
reported values). PhG, phosphate glass; PEO, polyethylene oxide; PS, 
polystyrene. c, Stress-strain measurements of cylindrical PES fibres 


a 300-nm-thick, continuous As»Se; film (Fig. 3b), which renders the 
fibre opaque. Fragmentation (Fig. 3c) reduces the optical opacity as 
a result of the voids opening up between the strips in the transparent 
polymer (Fig. 3d, e). Optical spectral transmission measurements 
reveal a blueshift of approximately 300 nm in the wavelength of the 
absorption edge after film fragmentation (Fig. 3d). These observa- 
tions are compared to theoretical predictions for a Fabry—Pérot optical 
model of the fibre that additionally takes into account absorption, dif- 
fraction and Fresnel reflection”? (Fig. 3e); see Methods. It is important 
to note that the change in the optical properties upon cold-drawing is a 
consequence of the mechanical-geometric transformation undergone 
by the embedded structure and not of the polymer itself. 

Figure 3f-k reveals the effect of polymer pre-stress (applied during 
thermal drawing of the fibre from the melt) on the uniformity and 
integrity of the fragmented strips. Large pre-stresses (Fig. 3f, i) pro- 
duce smaller draw ratios” (less transverse contraction during neck 
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produced by thermal drawing” at different pre-stress values (ranging from 
0.2 MPa to 7.8 MPa; see coloured-coded labels) identifying the four stages of 
linear elasticity, cold-drawing, strain-hardening and failure. The coexistence 
(dashed) lines and natural draw ratios (vertical dotted lines; values given 
above the plot) at neck stabilization (both defined in d) are identified. 

d, Schematic representation of the Considére model. The blue curve 
corresponds to the true stress versus stretch in a strain-controlled 
experiment. The solid black ‘coexistence’ line indicates necked and unnecked 
regions coexisting at equal engineering stresses as local stretch varies from 
the onset of necking to the natural draw ratio at which stable neck 
propagation occurs. The dashed black lines serve as guides to the eye. Above 
the plot is an SEM micrograph of a necked region in a PES fibre; the arrows 
indicate the direction of the axial stress and shoulder propagation. e, von 
Mises stress distributions from finite-element simulations of in-fibre core 
(As)Se3) fragmentation during cold-drawing (at 5mm s_) of a PES fibre. 
The five steps (i)-(v) correspond to increasing stretch values. Top panels 
depict the full fibre; bottoms panels show the regions corresponding to that 
highlighted by the rectangle in (i). 


propagation) and a more uniform stress field within the shoulder, 
resulting in longer and more-parallel strips. This effect is a consequence 
of the greater orientation of polymer chains along the thermal drawing 
direction when compared to fibres drawn at lower pre-stress values. 
Low pre-stresses result in opposite trends; the film fragmentation is 
more violent because the strain field is more two-dimensional; that is, 
both the axial and transverse components are non-negligible. Shear 
bands in low-melt pre-stressed fibres are expected to play a substantial 
part in this outcome, which is borne out in finite-element simulations; 
see Methods, Extended Data Fig. 3, and Supplementary Video 4 for a 
video of the simulation. We retrieve in our simulations the experimen- 
tally observed ratio of strip width to film thickness (Fig. 3f-k) and also 
observe prominent shear-banding in the polymer that could account 
for the more violent fragmentation in the case of low pre-stress. 
Several unique features of the in-fibre fragmentation process are 
highlighted in Fig. 4. First, the procedure extends to scenarios in which 
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Figure 3 | Fragmentation of a thin film embedded in a flat polymer 
fibre undergoing cold-drawing. a, Optical micrograph of a flat PES (‘P’) 
fibre containing a 300-nm-thick film of As,Se; (‘G’) (cross-section shown 
in inset), cold-drawn under axial stress. The propagating flat shoulder in 
the direction of the arrow separates an intact region (darker colour) and 

a reduced-size cold-drawn region (lighter colour). b, SEM micrograph 

of intact glass (‘G’) film retrieved from the fibre before cold-drawing. 

c, SEM micrograph of fragmented glass strips retrieved from the fibre after 
cold-drawing. d, e, Measured (d) and simulated (e) optical transmission 
through the flat fibre before (‘F’) and after (‘C’) cold-drawing. For 
reference, the transmission through the polymer alone (‘P’) is plotted in 
d. Inset is a photograph of a section of a fibre showing the change in the 
optical appearance after cold-drawing-driven fragmentation. f-h, Optical 
transmission micrographs of the necking region for three fibres produced 
at different pre-stress levels (250 g, f; 150g, g; 50g, h). The white arrows 
indicate the axial stress applied and the back arrow indicates the shoulder- 
propagation direction. Scale bars, 200 1m. i-k, Optical transmission 
micrographs of the sections in f-h enclosed in dashed boxes. Scale bars, 
501m. 


a large number of cores are embedded in a single polymer fibre*® 
(about 4,000 cores, each of 500-nm diameter; Fig. 4a). All of the cores 
simultaneously undergo fragmentation into uniformly sized rods. 
Thus, this process is readily scalable to large-volume production. 
Supplementary Video 5 shows an example ofa parallelepiped polymer 
fibre containing six cylindrical As2Se3 cores all undergoing fragmen- 
tation upon passage of the shoulder. Second, the length-to-diameter 
ratio may be tuned by using cylindrical shells rather than solid cores 
(Fig. 4b). Indeed, L/D for As2Se3, for example, changes from 6 for a 
solid core (Fig. 2a) to 4.5 and 3.2 when the ratio of the outer to inner 
diameter of the shell is 4 and 1.5, respectively (Fig. 4c). Here, only the 
shell undergoes fragmentation, while both the polymer inner core and 
the cladding are cold-drawn. After dissolving the polymer, hollow glass 
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Figure 4 | Characteristics of cold-drawing-driven fragmentation. 

a, Scalability of the in-fibre process. SEM micrograph of the fragmented 
rods retrieved from a 1-mm-diameter PES fibre containing about 4,000 
500-nm-diameter As)Se3 cores after cold-drawing. The left inset is a 
schematic of the process and the right inset highlights the size of the 
rods. b, Optical transmission micrographs before and after cold-drawing 
of a fibre whose core has a core-shell structure, with the inner core and 
cladding both PES (‘P’) and the shell As2Se3 (‘G’). Only the shell undergoes 
fragmentation, while the inner core and cladding are both cold-drawn. 

c, SEM micrographs of single hollow rods retrieved from a fibre such 

as in b with different ratios of inner (polymer-core; D;) to outer 
(glass-shell; D,) diameter. d, SEM micrographs of complex particle 
structures. From left to right, parallelepiped Janus rod, hollow cylindrical 
Janus rod, and triangular rod with square hole. Scale bars in c and 

d, 10,m. Schematics below the SEM micrographs in c and d show 
cross-sections of the structures, with AsS3 (G;) coloured orange and 
(As2Se3)o9Ge1 (G2) coloured black. e, Self-healing of glass fragments after 
thermal restoration. f, Photograph showing a polymer film (PES, ‘P’) 
with gold (Au) deposited on it before and after cold-drawing. SEM 
micrographs (left and right columns show two different magnifications) of 
the fragmented 20-nm-thick and 70-nm-thick gold films. 


microtubes are retrieved. Third, the geometry of the micro- or nano- 
particles produced is limited only by the ability to structure the con- 
tinuous core. Examples in Fig. 4d include bicompartmental 
parallelepiped Janus particles, hollow cylindrical Janus particles, and 
triangular particles with square holes. Such structures are produced 
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at the macroscopic scale via extrusion and then thermally drawn into 
a fibre with the desired transverse size; see Supplementary Information 
for further structures (including solid cylindrical Janus and core-shell 
particles). Cold-drawing then fragments the structures into particles 
while maintaining the complex cross-section. Fourth, the fragmenta- 
tion is thermoreversible (Fig. 4e): heating the drawn fibre above its 
glass transition temperature results in self-healing of the fragmented 
core as the initial fibre dimensions are restored. Upon heating the 
fibre to its softening temperature, the tensile stress is released and 
the polymer fibre contracts to its initial length via an imperfect 
shape-memory effect (SME). As voids between the fractured glass 
segments are eliminated, the segments fuse together (self-heal) and 
re-form an intact longitudinal core. This effect is not a SME because— 
by definition—SME can occur only in samples with continuous strain 
histories. 

Further, thermal drawing is not a necessary precursor to produc- 
ing structures that undergo in-fibre fragmentation. Commercially 
available polymer films may be directly exploited by coating them 
with a relatively brittle material. For example, we show in Fig. 4f the 
fragmentation of a 70-nm-thick gold layer that was sputtered onto a 
75-\um-thick PES film. Cold-drawing of this composite results in frag- 
mentation of the gold film into segments with widths of roughly 1.1 1m 
along the cold-drawing axis (see Extended Data Fig. 4 for the fragmen- 
tation of gold films with thicknesses of 20-70 nm). These segments 
produce a clear optical-diffraction-grating signature that is visually 
apparent to the naked eye and indicate a grating of period of approxi- 
mately 2.2 um, which is consistent with Fig. 4f (see Extended Data Fig. 5 
for optical measurements). Alternatively, a dry-erase marker pen can 
be used to write straight, thick lines (widths of 2-5 mm) on the PES 
film (Extended Data Fig. 6). The ink interacts with the top layer of 
the polymer film to form an ~1-\m-thick brittle crust that fragments 
upon cold-drawing in a manner similar to that of the gold film. 

Cold-drawing has been a mainstay of mass-production of contin- 
uous threads in the synthetic fibre and textile industries over most of 
the past century. It is a fundamental mechanical phenomenon that 
extends to the processing of macroscale metals”*”” and nanocrystals”*. 
We have shown that this process can be exploited to fabricate discon- 
tinuous three-dimensional arrays of multimaterial micro- and nano- 
structures. Our results suggest potential applications of cold-drawing 
in controlling the optical properties of macroscopic composite struc- 
tures through dynamical and thermoreversible nanoscale mechanical 
processes. This may lead to dynamical camouflaging via a nanoscale 
Venetian-blind effect, scalable fabrication of micro- and nanoparticles 
with arbitrary cross-sections, and large-area meta-surfaces for highly 
sensitive detection of pathogens. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

Fibre fabrication. The polymer cylindrical (Figs 1 and 2) and flat (Fig. 3) PES 
fibres are produced by thermal drawing in a custom-built fibre draw tower. In case 
of core materials that are thermally compatible with PES (that is, they soften in 
an overlapping temperature range in which they have comparable viscosity'*"’), 
such as chalcogenide and tellurite glasses, the polymer cladding and glass core 
are co-drawn into a fibre from a centimetre-scale ‘preform. The fibre pre-stress 
is set during the thermal drawing process; thermal drawing is followed by a 
rapid quench. In the case of all other materials that are not thermally compatible 
with PES, microfibres of these materials are first prepared (see Supplementary 
Information) and then placed within a hollow-core PES fibre, which is thermally 
drawn from a hollow preform. The fibre assembly is heated to enable collapse of 
the PES around the core material and thus to produce strong adhesion between 
PES and the core. 

Thin-film preparation. The glassy chalcogenide films (Fig. 3) used are ther- 
mally evaporated under vacuum onto a thin PES film that is then incorporated 
into a flat preform, thermally consolidated under vacuum, and subsequently 
drawn into a fibre. The gold films (Fig. 4f) are deposited using a sputtering sys- 
tem (Cressington 108) onto commercially available 75-|1m-thick PES thin films 
(Ajedium). The thickness of the gold is controllable up to 50 nm by varying the 
sputter time. Multiple sputtering procedures are performed to achieve thicknesses 
exceeding 50nm. 

Stress-strain measurements. Stress—strain measurements of the fibres are 
gathered from uniaxial tension testing performed via an Instron universal test 
machine with a load cell resolution of 0.01 N and the specimen ends are fixed 
using TestResources wave-type grips. The initial clear length of the fibre between 
the two grips is typically at least 20 cm. Each test is conducted under displacement- 
control of 10mm min~! to rupture. 

Simulations for cylindrical fibres. The Explicit solver of the general-purpose 
finite-element code ABAQUS*! is used for the computational analysis of the 
cold-drawing and fragmentation processes. For the cylindrical fibres (Fig. 2e), an 
axisymmetric model is built using CAX4R elements*! (541m x 5\1m) consisting 
of three concentric 2.52-mm-long cylindrical layers (Extended Data Fig. 2a). The 
inner section (10-\1m radius) corresponds to the core material, the cladding sec- 
tion (500-j1m radius) to the PES matrix. The stress-strain curve for PES (Extended 
Data Fig. 2b) is based on the test data (Fig. 2c). These two layers are separated by 
an interfacial layer** that possesses similar properties to PES, but with much 
weaker strength; we thus take the stress-strain curve for this interfacial layer to 
be the same as for PES, but multiplied by a scalar factor that depends on the core 
material. Extended Data Fig. 2c shows the stress-strain curve for the interfacial 
layer when the core is As)Se3. For the core section, we use a brittle-cracking failure 
model* available in ABAQUS, which includes a linear elastic range (taken from 
test data; see Supplementary Fig. 14f) and softening behaviour at crack propaga- 
tion. We use an explicit code to predict necking in the PES cladding****. 
Specifically, we use the elastic-viscoplastic continuum model proposed in ref. 36. 
The model is calibrated with respect to the test data and implemented in ABAQUS 
using the material subroutine VUMAT. In the simulations, the lower grip is fixed 
and the upper grip is assigned a constant velocity of 5mms~! upwards. 
Simulations are conducted for multiple core materials, including As,Se3 (Extended 
Data Fig. 2d), Si (Extended Data Fig. 2e), Ge and polystyrene. The results are 
summarized in Extended Data Table 2. The average length-to-diameter ratios 
L/D of fragmented rods and their standard deviations correlate well with the 
measurements (Fig. 2b). 

Simulations for flat fibres. The flat-fibre cold-drawing simulations are carried 
out using a similar procedure to that for the cylindrical-fibre simulations. A plane- 
strain model is built in the plane spanned by the longitudinal axis of the fibre and 
the fibre thickness (y-z plane in Fig. 3a). This is a good approximation for the 
thin film because the strain in the transverse direction is much smaller than that 
in the other two directions during cold-drawing. A quarter of the cross-section is 
modelled, with symmetric boundary conditions, using CPE4R elements*! approxi- 
mately 75 nm x 1,000 nm in size. There are three sections defined in the simulation: 
the film, an interfacial layer and the outer PES cladding. The dimensions used in 
the simulation (Extended Data Fig. 3) are the same as those in the experiment 
(Fig. 3a): the film and fibre thicknesses are 300 nm and 350 1m, respectively, and 
the initial fibre length is 1.44 mm. The material models used are the same as those 
used in the cylindrical cold-drawing simulations above (Extended Data Fig. 2). The 
moving grip applies a constant tensile velocity of 2mms~'. Extrusion and mirror 
methods*! are used to visualize the results in three dimensions. The width of the 
fragmented strips in the cold-drawing simulation is 7.55 1m, which is in good 
agreement with the measurements. 

Animations. Supplementary Videos 2 and 4 show simulations of the cylindrical 
fibre and of the flat fibre, respectively. In each video, the frames of the above- 
described simulations are assembled, in addition to a view of the insets shown in 


Fig. 2e (for the cylindrical fibre) and Extended Data Fig. 3 (for the flat fibre). In 
both videos, the speed of the frames is reduced by a factor of 100 compared to the 
time steps of the simulation drawing speed. 

Optical model for transmission through fragmented glass films. We consider 
an optical model of the flat fibre in Fig. 3a consisting of a 300-nm-thick film of 
As2Se3 embedded in a 300-j1m-thick, 2-mm-wide rectangular PES fibre (Fig. 3a). 
The refractive index of PES n,(A) is extracted from spectroscopic ellipsometry 
measurements in the wavelength range \=0.35-2 um. The refractive index of the 
glass n,() is obtained from a Sellmeier equation*” and experimentally confirmed 
by ellipsometric measurements, while spectral absorption in the visible range is 
obtained from optical transmission measurements; optical transmission through 
the thin glass film is negligible when \ < 0.5m. The glass—polymer interfaces give 
rise to optical-field Fresnel reflection and transmission coefficients rp(A) and 
tyg(A); the two air-polymer interfaces contribute a transmission coefficient tp(). 
Transmission through the intact flat fibre is Tintact = |tp|” |tpgl* |1 + raeha tel P, in 
which d is the thickness of the glass film and we have taken into consideration only 
the first two Airy wavelets in the lossy Fabry-Perot resonator formed of the glass 
film (absorption in the film diminishes the effect of higher-order reflected wave- 
lets). Transmission through a cold-drawn sample embedding the fragmented glass 
strips is calculated by weighing the transmission through an intact sample Tintact 
and the transmission through a polymer sample 7, =|tp|” with respect to the rel- 
ative areas of the fragmented strips and the voids separating them. If we define 6 
to be the ratio of the width of a void separating two strips to the width of a strip, 
then the transmission through the cold-drawn sample is T= (aT) + STintact)Naith 


in which q = 


: I(x)dx 

; ; Pp b=1-an aig¢() = a is the fraction of light diffracted 
through the evenly spaced glass strips that reaches the detector aperture as function 
of wavelength, and I(x) is the intensity distribution in the detector plane. Using the 
measured values of the refractive indices of the fibre materials in these formulae, 
we calculate the theoretical optical transmission spectra plotted in Fig. 3e. 
Diffraction measurements on fragmented gold films. The gold—PES cold-drawn 
samples diffract transmitted (and reflected, Fig. 4f) white light, leading to angular- 
selective visible colours. To quantify this observation, the diffracted spectra are 
measured in transmission mode as a function of the angle with respect to the 
axis defined by normal incidence. A broadband, incoherent, white-light optical 
source is coupled into a multimode optical fibre by means of a fibre collimator, 
and out-coupled light is collimated using a x 4 microscope objective lens. The 
5mm x 20mm sample is placed at a distance of 10cm from the lens, whereupon the 
beam diameter is approximately 4mm. Diffracted light at angle @ is coupled into a 
multimode fibre via a lens with a 50-mm focal length, and the spectra are recorded 
using an optical spectrum analyser (OSA; Advantest Q8381A); see Extended Data 
Fig. 5a. The diffracted light is blueshifted at small angles and redshifted at large 
angles (Extended Data Fig. 5b). 

Assuming normal incidence on the film, the ideal grating equation indicates 
that \ = Asin(6), in which . is the wavelength, A is the grating period and 0 is the 
diffraction angle of the first diffraction order with respect to the normal to the film. 
We identify 6 with the peak diffracted wavelength Amax in Extended Data Fig. 5b, 
so that we have the following pairings (Amax, 9): (420 nm, 10.6°), (700 nm, 17.4°) 
and (800 nm, 21.2°). These measurements indicate grating periods A of 2.3 jm, 
2.34,1m and 2.21 um, respectively, which are consistent with the SEM micrographs 
in Fig. 4f. 

Preparing ink-written films for cold-drawing. The ink—PES samples are pre- 
pared by cutting a 75-j1m-thick PES film (Ajedium) into strips with dimensions of 
about 5mm x 10cm. Using a black permanent marker, thick lines (approximately 
2-5-mm thick) are painted on one face of the strips. The marker tip is traced on 
the PES film only once to avoid non-uniformities in the ink layer left on the film 
surface (Extended Data Fig. 6a). The samples are dried for a few seconds and then 
cold-drawn by symmetrically pulling both ends using a pair of pliers (Extended 
Data Fig. 6b). The optical appearance of the film (Extended Data Fig. 6c) changes 
immediately after cold-drawing, whereupon bright coloured optical diffraction 
bands are visible to the eye. SEM micrographs of the surface of the film (Extended 
Data Fig. 6d, e) reveal that the ink absorbed at the surface of the film produces a 
crust with an average thickness of <1 1m. The thickness of the crust tapers at the 
edges of the drawn line. We find that the ink crust after cold-drawing fragments 
into strips whose width is proportional to the crust thickness. The ink-crust strips 
are parallel and aligned orthogonally to the cold-drawing axis (the long dimension 
of the polymer strip), similarly to the observations in Fig. 3 with the glass films 
embedded in a flat polymer fibre. Thus, at the edge of the line where the thickness 
drops rapidly, we concomitantly observe a rapid drop in the width of the strips 
(Extended Data Fig. 6e). 

Cold-drawing break-up versus stress transfer in fibre-reinforced composites. 
The results presented here are related to the shear-lag fracture (SLF) phenomena 
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often observed in fibre-reinforced composites. However, the fragmentation we 
report is distinct from canonical SLF in both aim and character, as follows. 

(1) In traditional composites, the role of the core is to strengthen the fibre. By 
contrast, our goal is to exploit the cold-drawing process to produce controlled 
fragmentation of the core. In other words, we exploit SLF rather than seeking to 
avoid it as has been the focus of most previous work. 

(2) There are several aspects that distinguish the SLF process reported here 
from canonical SLE. Since the seminal work by Cox”, the standard theoretical 
frameworks for analysing stress transfer in composites have used uniaxial mod- 
els. Most treatments have assumed that there is no load transfer at the fibre ends 
because they are both ‘free’ By contrast, our fibres are semi-infinite; fracture occurs 
at the free end, while the other end remains tethered to the remainder of the core. 
More fundamentally, cold-drawing necessarily produces non-trivial multi-axial 
stresses*”', especially in thin films”. We expect that these distinguishing features 
will stimulate future theoretical and experimental investigations into SLE. 

(3) In contrast to canonical SLF where fragmentation typically occurs simul- 
taneously in many locations, our core fibres fragment only within the advancing 
neck front. Once a fragment of the core is separated, it does not undergo any 
further fracture. 

(4) Core fracture occurs as the core passes through the propagating shoulder, 
producing multimaterial rods of controllable length. Alternatively, when one of the 
materials making up the core is the same polymer used as the cladding, this portion 
of the core/fibre undergoes cold-drawing and remains intact while the remaining 
materials fracture. This feature enables the production of hollow structures via 
selective dissolution of polymer. 

Extended Data Fig. 1 illustrates the distinction between SLF that takes place in 
the process reported here and in tradition fibre-reinforced composites. The nature 
of each fracture event taking place during the necking of the polymer fibres is the 
same as that occurring in SLF in composite materials. In both cases, a local brittle 
fracture takes place. The fundamental difference lies in the global dynamics of the 
break-up process. 

Traditional SLF. In the case of traditional SLF**“*! (Extended Data Fig. 1b), the 
setting is typically that of a composite material, usually fibres in a matrix (what we 
call core in a cladding). When axial stress is applied, random brittle fractures occur 
along the fibres. This process continues as long as the stress is applied or until a 
minimum size of the fractured fibre segments is reached. If the stress is removed 
before saturation is reached, then what remains is a collection of fibre fragments 
of unequal, random lengths. Re-applying the axial stress leads to a continuation of 
the brittle fracture (still localized in the shoulder) until saturation. Post-saturation, 
the axial stress does not lead to further fracture. 

Fragmentation during cold-drawing. In the case of fragmentation during cold-drawing 
(Extended Data Fig. 1a), the polymer cladding (or matrix) undergoes necking 
that is by and large independent of the brittle core. During the propagation of the 
neck upon axial stress, the core undergoes brittle fracture in an orderly fashion on 
a global scale: the fracture occurs sequentially within the moving shoulder. This 
process continues as long as the axial stress is applied or until the fibre becomes 
fully drawn. If the axial stress is removed before the fibre is fully drawn, then we are 
left with a section of the core that is intact and that has not undergone fracture (the 
section with the undrawn polymer), and a section that has fractured in a periodic 
sequence of equally sized fragments (the drawn section of the polymer fibre). 
Re-applying the axial stress results in a continuation of necking and resumption 
of the associated fragmentation. After the fibre is fully drawn, no more fragmen- 
tation takes place. 
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Extended Data Figure 1 | Schematic contrasting controlled (sequential) 
and uncontrolled (random) thread fragmentation. a, The designed 
fragmentation process that takes place during cold-drawing of a fibre 
consisting of a brittle core embedded in a ductile cladding. The overall 
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fiber —_—_—_—_ 


Stress transfer 
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length of the sample increases considerably when fully drawn. b, The 
random fragmentation that takes place during stress transfer in a 
composite sample consisting of a fibre embedded in a matrix. Thick purple 
arrows indicate externally applied stress. 
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Extended Data Figure 2 | Stress-strain models of the materials used in for the PES cladding (‘P’ in a). c, Stress—strain model (including both 
the finite-element computational model. a, Axisymmetric structure used _ elastic range and post-failure softening) for the PES interfacial layer 
in the computational model. P, polymer (PES); I, interfacial layer; C, core. (T in a). d, Stress—strain model for an As)Se3 core material (‘C’ in a). 
The same polymer and interfacial layer are used in the cylindrical and flat e, Stress—strain model for a silicon (Si) core; see Supplementary Figs 14 
fibre simulations. Various core materials are used. b, Stress—strain model and 15. 
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Extended Data Figure 3 | Simulation of cold-drawing in a flat fibre Fig. 2). The five steps (i)-(v) correspond to increasing stretch values. 
containing a thin brittle film. The results of nonlinear finite-element Top panels depict the full fibre; bottoms panels show the regions 
simulations showing contour plots of the evolving von Mises stress corresponding to that highlighted by the rectangle in (i). P, polymer (PES); 
distribution with increasing stretch, using the same (isotropic) materials G, glass (As)Se3). 


(PES and As»Se3) as in the cylindrical case (Fig. 2e and Extended Data 
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40 nm 


30 nm 


Extended Data Figure 4 | Fragmentation of a gold thin film under cold- —_ columns show SEM micrographs of the gold films after cold-drawing at 
drawing. Each row corresponds to a different thickness of gold (20 nm, two different scales to highlight the dependence of the average fragment 
30nm, 40 nm and 70 nm) sputtered onto a 75-j1m-thick PES film. The size on the thickness of the gold layer. P, PES film; Au, sputtered gold. 
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Extended Data Figure 5 | Spectral diffraction measurements from a 
gold film fragmented by cold-drawing. a, Optical set-up used to measure 
the spectrum of light diffracted at an angle 0 from a thin gold film of 
thickness 70 nm on a 75-\1m-thick PES film after fragmentation via cold- 
drawing (Extended Data Fig. 4, first row). OSA, optical spectrum analyser; 
FC, fibre coupler; @ is the angle with respect to normal incidence on the 
film. b, Measured diffracted spectra on a vertical logarithmic scale. The 
spectra are normalized with respect to the input optical spectrum. Each 
spectrum is then normalized to its maximum value. 
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Extended Data Figure 6 | Fragmentation of a layer of ink on PES under write across the whole film surface). d, SEM micrograph of the drawn line 
cold-drawing. a, b, Photographs depicting the cold-drawing procedure. reveals that a crust is formed at the PES surface that fragments into strips 
a, A line is drawn on a 75-\1m-thick PES film (5mm x 10cm) using a that are orthogonal to the cold-drawing axis (similarly to in Figs 3 and 4f), 
dry-erase marker pen. b, Using two pliers, the two ends of the strip are which are behind the new optical properties of the strip seen in c. 

pulled symmetrically by hand until cold-drawing is complete. c, After e, SEM micrograph of the edge of the drawn line, showing a tapering of the 
cold-drawing, the optical appearance of the strip changes and coloured thickness of the ink crust, and concomitant drop in fragmentation period. 
diffracted bands are apparent to the naked eye (the marker pen is used to I, ink-polymer crust; P, PES polymer film. 
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Extended Data Table 1 | Measured values of L/D and measured and reported values of E, for different core materials 


Material 


65 — 73.1 


Si 130-202 


Ge 102.7-103 


Pho? 31.3-79 


Silk” 3.8-17 


ChG***° ; 18-40 


TeG*! 37.1-50.7 


= 52,53 
Hair ' 2.5-7.5 


4, 
PEO*™*»» : 0.2-7 


20,44,56 


Ice ; : ‘ 0.3-10 


PS : ; 33.5 


he measures values of L/D are means, with the standard deviation (‘St. Dev.’) also given for each material. The measured and reported values of E are means, with E+A 
indicating the range of measured values, and A indicating the range of reported values. PhG, phosphate glass; ChG, chalcogenide glass; TeG, tellurite glass; PEO, polyethylene 
oxide; PS, polystyrene. Reported values are from refs 42-56, as indicated. 
We did not measure E for the TeG used in our experiments. We produced polymer-TeG fibres by thermal co-drawing, which requires that the polymer and the glass have over- 
apping softening temperatures Tso (in our context, this refers to the temperature at which the viscosity drops to values compatible with thermal drawing, typically 

0*-10® Poise). All TeG materials reported in the literature have a Tso that is substantially higher than that of engineering thermoplastic polymers such as PES. We modified the 
eG composition to reduce Tso. However, the TeG composition we used is hygroscopic; while the TeG is isolated from the ambient environment within the PES cladding, it remains 
stable; once the PES cladding is removed (to measure £), the TeG takes up moisture and becomes brittle. We expect that E is reduced when the Tso of the material is reduced. 
(We did not measure EF for ice and instead plot in Fig. 2b the range of values that have been reported in the literature. This range is quite large, because E for ice depends on 
temperature, pressure, contaminants in the water, and so on. It is difficult to determine exactly the temperature of the ice during necking, but it is expected that the ice is in the 
rocess of melting and thus we estimate that E is on the lower end of the specified range. 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


Extended Data Table 2 | Comparison of simulated and measured (Fig. 2b) average value of L/D (denoted L/D) and its standard deviation for 
different core materials 


i Ce SE 
L/D St. Dev. in L/D L/D St. Dev. in L/D 


Polystyrene (PS) 


The cladding polymer is PES. 
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Abiological catalysis by artificial haem proteins 
containing noble metals in place of iron 


Hanna M. Key!*, Pawel Dydio!**, Douglas S. Clark*4 & John E. Hartwig)? 


Enzymes that contain metal ions—that is, metalloenzymes— 
possess the reactivity of a transition metal centre and the potential 
of molecular evolution to modulate the reactivity and substrate- 
selectivity of the system!. By exploiting substrate promiscuity 
and protein engineering, the scope of reactions catalysed by 
native metalloenzymes has been expanded recently to include 
abiological transformations’. However, this strategy is limited by 
the inherent reactivity of metal centres in native metalloenzymes. 
To overcome this limitation, artificial metalloproteins have 
been created by incorporating complete, noble-metal complexes 
within proteins lacking native metal sites’*°. The interactions 
of the substrate with the protein in these systems are, however, 
distinct from those with the native protein because the metal 
complex occupies the substrate binding site. At the intersection 
of these approaches lies a third strategy, in which the native metal 
of a metalloenzyme is replaced with an abiological metal with 
reactivity different from that of the metal in a native protein®*. 
This strategy could create artificial enzymes for abiological 
catalysis within the natural substrate binding site of an enzyme that 
can be subjected to directed evolution. Here we report the formal 
replacement of iron in Fe-porphyrin IX (Fe-PIX) proteins with 
abiological, noble metals to create enzymes that catalyse reactions 
not catalysed by native Fe-enzymes or other metalloenzymes”"”. 
In particular, we prepared modified myoglobins containing an 
Ir(Me) site that catalyse the functionalization of C-H bonds to 
form C-C bonds by carbene insertion and add carbenes to both 
B-substituted vinylarenes and unactivated aliphatic c-olefins. 
We conducted directed evolution of the Ir(Me)-myoglobin and 
generated mutants that form either enantiomer of the products 
of C-H insertion and catalyse the enantio- and diastereoselective 
cyclopropanation of unactivated olefins. The presented method 
of preparing artificial haem proteins containing abiological metal 
porphyrins sets the stage for the generation of artificial enzymes 
from innumerable combinations of PIX-protein scaffolds and 
unnatural metal cofactors to catalyse a wide range of abiological 
transformations. 

To create artificial metalloenzymes formed by combining abiolog- 
ical metals and natural metalloprotein scaffolds, we focused on haem 
proteins, which contain Fe-porphyrin IX (Fe-PIX) as a metal cofactor. 
Native haem enzymes catalyse reactions that include C-H oxidation 
and halogenation'', and they have been successfully evolved to 
oxidize abiological substrates'”’*. Fe-PIX proteins have also been 
shown to catalyse abiological reactions involving the addition and 
insertion of carbenes and nitrenes to olefins and X-H bonds?3*"4, 
However, the reactivity of the Fe-centre in haem proteins limits the 
scope of these transformations. For example, Fe-PIX proteins catalyse 
the cyclopropanation of activated terminal vinylarenes”!°, but they 
do not catalyse reactions with internal vinylarenes or unactivated 
alkenes. Likewise, they catalyse insertions of carbenes into reactive 


N-H and S-H bonds, but they do not catalyse the insertion into less 
reactive C-H bonds*"*. 

Because the repertoire of reactions catalysed by free metal-porphyrin 
complexes of Ru (ref. 15), Rh (ref. 16) and Ir (ref. 17) is much greater 
than that of the free Fe-analogues, we hypothesized that their incorpora- 
tion into PIX proteins could create new enzymes for abiological catalysis 
that is not possible with Fe-PIX enzymes. Artificial PIX proteins contain- 
ing Mn, Cr, and Co cofactors have been prepared to mimic the intrin- 
sic chemistry of the native haem proteins!*-7!, but the reactivities and 
selectivities of these processes are lower than those achieved in the same 
reactions catalysed by native Fe-PIX enzymes. Thus, artificially metal- 
lated PIX proteins that catalyse reactions that are not catalysed by native 
Fe-PIX proteins are unknown, and the current, inefficient methods 
to prepare PIX proteins containing non-native metals have hindered the 
potential for directed evolution of the resulting enzymes**>. 

To evaluate rapidly the potential of artificial [M]-PIX enzymes, we 
envisioned creating an array of catalysts formed by pairing numerous 
mutants of apo-PIX proteins and [M]-cofactors in a combinatorial 
fashion. Previously, apo-PIX proteins have been prepared from native 
Fe-PIX enzymes by acidic, denaturing extraction of the Fe-cofactor, 
followed by extensive dialysis to refold the protein”*. This multistep 
process is too lengthy for directed evolution, and the harsh, acidic con- 
ditions are known to result in proteins that are heterogeneous in struc- 
ture, which would be detrimental for selective catalysis”©. Alternatively, 
Ru-, Mn- and Co-PIX proteins have been expressed directly??-?° , but 
these methods are not general, require a gross excess of metal cofactor, 
and would require a time-consuming purification of each combination 
of metal and protein. To avoid the aforementioned liabilities of these 
reported methods in the creation of the proposed catalyst library, 
we sought to express directly and purify apo-PIX proteins lacking 
the entire haem unit and to reconstitute them with metal cofactors 
containing metals other than iron in a stoichiometric fashion (Fig. 1a). 

Evaluation of a series of expression conditions revealed those 
suitable for recombinant expression of the apo-form of haem pro- 
teins in Escherichia coli (Supplementary Tables 1 and 2). Under the 
optimized conditions (Supplementary Fig. 1 and Supplementary 
Table 1), involving minimal media lacking Fe to minimize the bio- 
synthesis of hemin and low temperature to mitigate the instability 
of the apo-form, we expressed successfully the protein containing 
less than 5% of the Fe-PIX cofactor, as determined by inductively 
coupled plasma optical emission spectroscopy (ICP-OES). In par- 
ticular, mutants of Physeter macrocephalus myoglobin (Myo) and 
Bacillus megaterium cytochrome P450 BM3h (P450) with and 
without an mOCR stability tag were overexpressed in high yields 
and purified (up to 70 mg1! of protein; Fig. 1a, Supplementary 
Tables 1 and 2)*!°?7. Circular dichroism spectroscopy revealed that 
these apo-proteins retain the fold of their native Fe-PIX analogues 
(Fig. 1b, Supplementary Fig. 2). The obtained apo-proteins were 
reconstituted quantitatively upon addition of stoichiometric amounts 
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Figure 1 | Strategy for expedient preparation of 
[M]-PIX-proteins. a, Direct expression, Ni-NTA 
purification, and diverse metallation of 

apo-PIX proteins to generate PIX proteins 
containing Co, Cu, Mn, Rh, Ir, Ru, and Ag sites or 
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b, Comparison of the circular dichroism spectra 
obtained from directly expressed apo-Myo, the 
same protein reconstituted with Fe-PIX (hemin), 
and the same mutant expressed as a native 
Fe-PIX protein. 
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of various [M]-PIX cofactors, as determined by native nano electros- 
pray ionization mass spectrometry (Supplementary Fig. 3). Moreover, 
reactions catalysed by reconstituted Fe-myoglobin and Fe-P450 
occurred with the same enantioselectivities as those catalysed by native 
Fe-proteins (Supplementary Fig. 4)”!°, providing strong evidence that 
this method indeed generates [M]-PIX-proteins with the intact active 
site and with the cofactors bound at the native PIX-binding site. Further 
studies revealed that reconstituted mOCR-myoglobins are stable on 
storage; reactions catalysed by freshly prepared, frozen, and lyophilized 
enzymes proceeded with comparable enantioselectivity (see below, 
Supplementary Fig. 5). 

Following this method, we directly expressed eight variants of 
apo-mOCR-Myo-H93X, each carrying a different mutation to the 
axial ligand position (H93X). Upon reconstitution of each variant 
with nine different porphyrin cofactors (containing Fe(Cl)-, Co(Cl)-, 
Cu-, Mn(Cl)-, Rh-, Ir(Cl)-, Ir(Me)-, Ru(CO)- and Ag-sites, 
Supplementary Table 2), we rapidly accessed 72 potential catalysts 
whose activity profiles are distinct from those of wild-type myoglobin, 
owing to the identity of the metal centre and the amino acid residue 
serving as the axial ligand (Fig. 2a, b)*. 

Natural haem proteins functionalize C-H bonds to form C-O 
bonds!!, but no haem protein is known to functionalize a C-H bond 
to form a C-C bond. To identify an enzyme for the insertion of a car- 
bene into a C-H bond, the array of artificial [M]-mOCR-myoglobins 
containing various metals and axial ligands was evaluated for the 
reaction of diazoester 1 to form chiral dihydrobenzofuran 2 (Fig. 2a). 
All myoglobins formed from the native Fe-PIX cofactor were inactive, 
regardless of the axial ligand. In contrast, non-native metals formed 
active catalysts when paired with an appropriate axial ligand. The 
most active catalysts, those containing Ir(Me)-PIX, were formed by 
incorporating both an abiological metal (Ir) and an abiological axial 
ligand (-CH3) that cannot be incorporated though standard mutagen- 
esis techniques. The eight myoglobins containing Ir(Me)-PIX formed 
enantioenriched dibenzohydrofuran 2 in up to 50% yield before any 
further mutagenesis (see below). Moreover, this artificial enzyme 
tolerated modifications to all portions of the substrate; diazoesters 
6-11, containing varied ester, arene, and alkoxy functionalities 
(Fig. 3a, Supplementary Fig. 6), also underwent C-H insertion in the 
presence of Ir(Me)-PIX-Myo. Together, these results show that the 


multi-dimensional evaluation of reconstituted PIX-enzymes can 
identify new artificial metalloenzymes that catalyse reactions that 
biological Fe-PIX-proteins do not catalyse. 

A substantial benefit of using enzymes for synthetic applications is 
the potential to use directed evolution to obtain a catalyst with desired 
properties!>8. To develop an enantioselective catalyst for each of the 
seven substrates undergoing C-H insertion, we followed a hybrid 
strategy based on stepwise optimization of small sets of amino acids 
progressively more distal from the reaction site (Fig. 4). In the first 
phase, the axial ligand (H93) was modified to A or G and the resi- 
due directly above the metal centre (H64) was modified to A, V, L, or 
I (Fig. 4) to give an initial set of eight mutants. In the second phase, 
these initial eight mutants were modified at positions F43 and V68, 
which are located in the binding site (Fig. 4), to generate 225 prospec- 
tive enzymes. To retain the hydrophobicity of the site that binds the 
porphyrin and substrate, only hydrophophic and uncharged residues 
(V, A, G, E Y, S, T) were introduced at positions F43 and V68. Of these 
225 mutants, 22 that were among the most selective for one or more of 
the substrates in Fig. 3 were subjected to a further round of evolution 
during which the residues at four additional positions (L32, F33, H97, 
and 199) were modified to generate 217 more mutants (Fig. 4). 

The complete results of the carbene insertion reaction with these 
mutants are provided in Supplementary Tables 5-11 and are summa- 
rized in Fig. 3b and Supplementary Table 4. The directed evolution of 
Ir(Me)-myoglobins uncovered distinct enzymes catalysing the C-H 
functionalization to form either enantiomer of the products containing 
a new C-C bond formed from substrates 1 and 6-11. The reactions 
occurred with selectivities up to an enantiomeric ratio (e.r.) of 92:8 and 
with yields up to 97% (Fig. 4, Supplementary Fig. 7 and Supplementary 
Tables 4-11) with enzymes that were evolved from those giving nearly 
racemic product. The Ir-myoglobins are suitable catalysts for synthetic- 
scale reactions; the carbene insertion of substrate 11 formed the product 
containing a new C-C bond in 80% isolated yield from a reaction of 
28 mg of 11 with nearly the same enantioselectivity as observed on 
smaller scale (Fig. 3). A reaction conducted with a 40,000:1 ratio of 
substrate to Ir(Me)-mOCR-myo occurred with a turnover number 
(TON) of 7,200 (Fig. 3). 

In contrast to the few directed evolutions of artificial enzymes 
reported previously’, our method of preparing variants of the 


Figure 2 | Evaluation of artificial [M]-PIX 
mOCR-myoglobins as catalysts. 

a, b, Catalysts for the insertion of 
carbenes into C-H bonds (a; 1 — 2) 
and for the addition of carbenes to an 
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for the C-H insertion reaction: 10 mM 
substrate and 0.5% catalyst in 250 il 
buffer (10 mM Tris, pH 8.0 containing 
8 vol.% MeCN). Reaction conditions 
for the cyclopropanation reaction: 
10mM olefin, 30mM EDA, and 0.5% 
catalyst in 25011 buffer (10 mM Tris, 
pH 8.0 containing 8 vol.% MeCN). 
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1 93A, 64L, 43L, 33V 85:15 (134) 93G, 64L, 68A, 33F, 99V 16:84 (62) 

6 93A, 64V, 43Y, 68A, 97W 81:19 (128) 93A, 64L, 68A, 99V 20:80 (26) 

7 93G, 64L, 43L, 99F 92:8 (92) 93A, 64L, 43W, 68A, 97Y 25:75 (68) 

8 93A, 64V, 68A, 103C, 108C 90:10 (98) 93A, 64L, 43W, 68T 23:77(164) 

9 93A, 64L, 43W, 68A, 331 85:15 (86) 93A, 64L, 431, 68T 25:75 (138) 

10 93A, 64V, 43H, 68S 77:23 (44) 93A, 64V, 68A, 33V, 97Y 17:83 (44) 
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Figure 3 | Summary of activities and selectivities 
for reactions catalysed by Ir(Me)-mOCR-Myo. 

a, Substrates (1, 6-11) for C-H insertion reactions 
and the most selective mutants identified through 
the directed evolution. Notations (++)- and 
(—)-enantiomer distinguish the formation of 
opposite enantiomers of the product. *Reaction 
ona 0.12 mmol scale of substrate, TON based 

on isolated yield of product (80%). +Reaction 

ona 0.05 mmol scale of substrate with 0.0025% 
catalyst. Reaction conditions: 10 mM substrate and 
0.5% catalyst in 250 pl buffer (10 mM Tris, pH 8.0 
containing 8 vol.% MeCN). b, Carbene addition to 
internal and aliphatic olefins catalysed by mutants 
of Ir(Me)-mOCR Myo that were found to be the 
most selective. d.r., diastereomeric ratio. Reaction 
conditions: 10 mM olefin, 60 mM EDA, and 0.5% 
catalyst in 250 ul buffer (10 mM Tris, pH 8.0 
containing 8 vol.% MeCN). EDA was added over 
12h via syringe pump. 


Ir(Me)-PIX-enzyme enabled us to pursue an individual, eight-site the observation of high TONs, demonstrate that the direct expression 
evolutionary trajectory for the reaction of each substrate that iden- of apo-Myo, the insertion of diverse [M]-PIX cofactors, and the sub- 
tified catalysts selectively forming either enantiomer of all targeted sequent directed evolution of the most active enzymes identified 
products. These results demonstrate that Ir(Me)-PIX-myoglobins are __ is a robust strategy that can be applied in a general way to create stereo- 
highly evolvable for different substrates containing varied structural _ selective enzymes for abiological catalysis that cannot be accomplished 
modifications. These results, along with the high isolated yieldand _ by any natural enzymes. 
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Figure 4 | Directed evolution strategy used 
to obtain Ir(Me)-mOCR-Myo mutants 
capable of producing either enantiomer of 
the products of C-H insertion reactions of 
varied substrates. Inner sphere (red boxes), 
middle sphere (blue), and outer sphere (yellow) 
residues are highlighted in the depiction of the 
active site (top left) and in the evolutionary 
tree. (Image of active site and its surroundings 
produced in Chimera from PDB 1MBN*”.) 
The strategy is exemplified by showing the 
enantioselectivies achieved in the formation 
of the product boxed on the right. Mutants 
positioned above the red dotted line formed 
predominantly the opposite enantiomer of 
those shown below the dotted red line. Some 
mutants are shown in boxes that are semi- 
transparent for clarity of the figure. In the case 
of selectivities obtained for additional products, 
the e.r. values given in green and purple colour 
are those for reactions forming predominantly 
the opposite enantiomers. The reactions were 
run under the conditions described in Fig. 3. 


To assess the generality of this approach further, we sought 
catalysts for the cyclopropanation of internal alkenes and c-olefins that, 
like carbene insertion into C-H bonds, have not been accomplished 
with natural or artificial enzymes. As a starting point, we evaluated 
the two-dimensional array of [M]-PIX catalysts shown in Fig. 2b for 
the cyclopropanation of 3-methylstyrene 3 with ethyl diazoacetate 
4 (EDA). In agreement with literature reports’, Fe-PIX enzymes did 
not catalyse this reaction (Fig. 2b). In contrast, Rh-, Ru-, and Ir-PIX 
enzymes furnished the cyclopropane product 5. The enzyme containing 
Ir(Me)-PIX was the most active. Although further work is needed 
to obtain full conversion and high enantiomeric excess (e.e.), the 
reaction of EDA with 8-methylstyrene catalysed by the Ir(Me)-mOCR-Myo 
mutant H93A, H64V, F43Y, V68A, H97F formed the cyclopropane 
5 with a TON of 40 with 70:30 e.r., and with a high >33:1 ratio of dias- 
tereomers, favouring the trans isomer. 

Having observed the expanded scope of enzyme-catalysed cyclo- 
propanation, we assessed the ability of Ir(Me)-PIX enzymes to catalyse 
the cyclopropanation of 1-octene, an unactivated, aliphatic olefin. The 
series of Ir(Me)-PIX-Myo enzymes assessed for C-H insertion reactions 
were tested as catalysts for the reaction of EDA with 1-octene. Although 
the mutant H93A, H64A, V68F formed the products of C-H insertion 
from all substrates unselectively, the same mutant formed the prod- 
uct of cyclopropanation of 1-octene in an enantiomeric ratio of 91:9 
and a trans:cis ratio of 40:1 (Figs 3b, 4; Supplementary Tables 12-14). 
Cyclopropanations of aliphatic alkenes catalysed by traditional metal 
complexes are typically conducted with an excess of the alkene”’. In 
contrast, the Ir(Me)-PIX-Myo mutant catalyses the reaction with 
an excess of EDA (a TON of 42 with a 10:1 ratio of EDA:1-octene), 
suggesting that reactions can be developed with valuable alkenes as 
limiting reagent. The reactions with fewer equivalents of EDA occur 
with lower TON due to consumption of EDA by dimerization or O-H 
insertion of water. These cyclopropanations of unactivated alkenes 
show the broad potential to evolve artificial myoglobins containing 
abiological active sites for reactions that are not catalysed by enzymes 
containing native metals. 

The work presented here demonstrates that unknown enzymatic 
reactivity can be achieved by incorporating just a metal ion with 
an accompanying small ligand into a well-known metalloprotein, 
while retaining the native structure of the active site. Selectivity for 
specific substrates, then, can be achieved readily by directed evolution. 
Considering the rich chemistry of free metalloporphyrins and the ease 
of preparation and evolution of haem proteins containing diverse metals 
by the methods just described, this methodology should seed the 
creation of many new artificial metalloenzymes with diverse, unnatural 
reactivity. Moreover, the facile, direct expression of apo-haem proteins 
could be used in tandem with strategies to incorporate highly active 
noble-metal complexes of ligands beyond porphyrins. Access to such a 
range of artificial haem proteins provides a nearly limitless opportunity 
to achieve catalytic reactions with selectivity derived from the interac- 
tion of the substrate with a natural, evolvable binding site. 
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Over 50% of patients who survive neuroinvasive infection with West 
Nile virus (WNV) exhibit chronic cognitive sequelae’. Although 
thousands of cases of WNV-mediated memory dysfunction accrue 
annually’, the mechanisms responsible for these impairments are 
unknown. The classical complement cascade, a key component of 
innate immune pathogen defence, mediates synaptic pruning by 
microglia during early postnatal development*”. Here we show that 
viral infection of adult hippocampal neurons induces complement- 
mediated elimination of presynaptic terminals in a murine WNV 
neuroinvasive disease model. Inoculation of WNV-NS5-E218A, a 
WNV with a mutant NS5(E218A) protein®’ leads to survival rates 
and cognitive dysfunction that mirror human WNV neuroinvasive 
disease. WNV-NS5-E218A-recovered mice (recovery defined as 
survival after acute infection) display impaired spatial learning and 
persistence of phagocytic microglia without loss of hippocampal 
neurons or volume. Hippocampi from WNV-NS5-E218A-recovered 
mice with poor spatial learning show increased expression of genes 
that drive synaptic remodelling by microglia via complement. 
C1QA was upregulated and localized to microglia, infected neurons 
and presynaptic terminals during WNV neuroinvasive disease. 
Murine and human WNV neuroinvasive disease post-mortem 
samples exhibit loss of hippocampal CA3 presynaptic terminals, 
and murine studies revealed microglial engulfment of presynaptic 
terminals during acute infection and after recovery. Mice with fewer 
microglia (I134—'~ mice with a deficiency in IL-34 production) or 
deficiency in complement C3 or C3a receptor were protected from 
WNV-induced synaptic terminal loss. Our study provides a new 
murine model of WNV-induced spatial memory impairment, 
and identifies a potential mechanism underlying neurocognitive 
impairment in patients recovering from WNV neuroinvasive 
disease. 

Studies in humans and rodents indicate that WNV targets neurons 
within the hippocampus’, which is essential for spatial and contextual 
memory formation’. Patients that survive WNV neuroinvasive 
disease often exhibit impaired visuospatial processing and memory’"”. 
In post-mortem samples of patients with WNV neuroinvasive disease 
and in rodent models with low survival rates (<50%), significant 
neuronal loss, inflammation, and microglial activation occur within 
infected brain regions!"!*, However, the extent of viral burden and 
neuron loss may be much lower in individuals who survive WNV 
neuroinvasive disease and may vary between brain regions. In addition, 
host-pathogen interactions could explain the range of cognitive sequelae 


experienced by WNV neuroinvasive disease survivors’. 


Mechanisms underlying cognitive impairments in patients recov- 
ering from West Nile neuroinvasive disease are unknown, mainly due 
to lack of murine recovery models. Current models that use virulent 
WNYV strains yield either 100% death following intracranial WNV 
infection or 10-70% survival and variable CNS viral burdens following 
peripheral routes of infection’. To circumvent this in order to develop 
a model of recovery from WNV neuroinvasive disease, we used a strain 
of WNV with a point mutation in nonstructural protein 5 NS5(E218A), 
which lacks functional 2’-O methyltransferase that generates a 5’ cap 
on viral RNA to evade type I interferon-mediated restriction of viral 
translation®'®. While WNV-NS5-E218A replicates in permissive cells, 
90% of 8-week-old mice survive intracranial inoculation (Extended 
Data Fig. la), with uniform viral brain burdens peaking between 
6-8-days post-infection (dpi)’, followed by viral clearance at 15 dpi 
(Extended Data Fig. 1b, c). Intracranial infection with WNV-NS5- 
E218A induces neuroinflammation within 7 dpi, with numbers and 
phenotypes of infiltrating leukocytes similar to intracranial infection 
with virulent strain WNV-NY99 (Extended Data Fig. 1d), and consistent 
with data demonstrating early reversion of WNV-NS5-E218A to 
wild-type virus within the central nervous system’. WNV-NS5-E218A 
intracranial infection results in few apoptotic neurons (0.5-1.5%) at 
7 dpi (Extended Data Fig. le), similar to 8-week-old mice after footpad 
infection with WNV-NY99 (ref. 12). Thus, WNV-NS5-E218A may be 
used to examine behavioural, cellular, and molecular mechanisms of 
recovery from WNV neuroinvasive disease. 

The Barnes maze behavioural task was used to determine whether 
WNV-NS5-E218A-recovered mice exhibit neurocognitive deficits!”. 
At 46 dpi, WNV-infected mice exhibit slower learning, commit more 
errors (Fig. la) and require more time (Fig. 1b) before locating the 
target hole than mock-infected controls (Supplementary Videos 1 
and 2). Studies performed in mice at 22 dpi showed similar effects 
(Fig. le and Extended Data Fig. 1f). WNV-NS5-E218A-recovered 
animals improve over the 5 days of training, although they continue to 
make significantly more errors than controls on each day. Recovered 
animals do not exhibit impairments in motor activity or exploratory 
anxiety in an open-field behavioural assessment at 45 dpi (Fig. 1c, d) or 
21 dpi (Extended Data Fig. 1g), indicating that impairments in Barnes 
maze solving are specific to spatial learning. 

Caspase-3-dependent apoptosis of neurons occurs in animals 
that succumb to WNV neuroinvasive disease’”. TUNEL and NeuN- 
staining did not detect ongoing apoptosis (Extended Data Fig. 2a) or 
loss of NeuN* neurons within the circuitry of the entorhinal cortex 
and hippocampus that serve spatial learning (Extended Data Fig. 2b). 
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Figure 1 | WNV-mediated spatial learning and memory impairments 
and activated microglia persist beyond 45 days post-infection. a, b, At 
46 days post-infection (dpi), mock or WNV-NS5-E218A-infected mice 
underwent 5 days of testing on the Barnes maze spatial learning task. 
Errors (a) and latency (b) before finding target hole were scored daily 
(mean of 2 trials per day, ***P < 0.001, *P< 0.05 by repeated measures 
two-way ANOVA). ¢, d, At 45 dpi, mice were observed on the open-field 
test and assessed for locomotor activity (c) and anxiety (d). a-d, Mock 
(n= 27) and WNV-NS5-E218A-infected (n = 23) mice. e, Mock (n= 23) 
and WNV-NS5-E218A-infected (n = 26) mice were tested at 22 dpi ona 
3-day version of the Barnes maze, and evaluated as in a. f, Immunostaining 
for IBA1 in control and WNV-NS5-E218A-infected mice at 7 dpi (n=6 
or 7 per group for control or WNV, respectively), 25 dpi (n = 3 or 4 for 


Similarly, hippocampal and total brain volumes at 52 dpi do not 
differ between mock-infected and WNV-NS5-E218A- infected animals 
(Extended Data Fig. 2c), and GFAP staining within the hippocampus 
was unchanged (Extended Data Fig. 2d). However, microglial nodules 
were detected within the hippocampus of WNV-NS5-E218A-recovered 
animals at 7,25, and 52 dpi (Extended Data Fig. 3b). These nodules 
contained increased numbers of IBA1-positive cells with activated cell 
morphology (Fig. le). At 7 dpi, increased levels of CD68, a marker of 
microglial/macrophage lysosomal activation, were observed within 
both microglia and infiltrating macrophages of CX3;CR1-GFP*/- mice 
(Fig. 1g, h), however macrophage infiltration was absent by 25 dpi 
(Extended Data Fig. 2f). 

Whole-transcriptome microarray of hippocampi RNA from 
mock-infected versus WNV-NS5-E218A-infected mice at 25 dpi 
enabled detection of differential expression of 1,364 transcripts. 
Pathway analysis identified signatures associated with the generation 
and maintenance of synapses, activation of innate immune responses, 
and microglial proteins involved in sensing endogenous ligands and 
microbes'® (Supplementary Table 1). In the latter category, we identi- 
fied genes associated with microglial-mediated phagocytosis (Cx3cr1 
which encodes CX3CR1; Dap12 which encodes Dap12 (also known 
as Tyrobp); Fcerlg which encodes FceR1G; Fegr2b which encodes 
FeyR2; Rac2 which encodes Rac2 and Was which encodes WAS) and 
the classical complement pathway (C1qa which encodes C1QA; 
C2 which encodes C2; C3 which encodes C3; C4b which encodes C4b 
and Serp1 which encodes Serping1) (Fig. 2a), which were validated 
using quantitative PCR (qPCR) (Fig. 2b). Clq and C3 are required 
for retinogeniculate and cortical synaptic pruning during murine 
CNS development**!°. Although complement contributes to control 
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control or WNV, respectively), and 52 dpi (n = 6 or 4 for control or 
WNY, respectively) (mean of 2 technical replicates used). g, h, 
Immunostaining shows increased levels of CD68, a microglial/ 
macrophage lysosomal activation marker, in WNV-NS5-E218A-infected 
wild-type mice (g) (n=4 mice per group) and CX3CR1-GEP*!~ (h) 
(n=3 mice per group) mice. h, CD68 is present within CX;CR1-positive 
microglia (white arrowheads) and infiltrating macrophages 

(red arrowheads). Images are representative of at least 3 mice per 

group. All panels, ***P < 0.001, *P < 0.05, NS, not significant by 
two-tailed t-test, and scale bars, 101m, unless otherwise noted. Error bars, 
s.e.m. Immunostaining and quantification were performed within the 
hippocampal CA3 region. 


of WNV dissemination following peripheral infection’, complement 
expression within the brain during WNV neuroinvasive disease has 
not been investigated. 

To identify genetic signatures specific to spatial learning defects, 
we categorized WNV-NS5-E218A-recovered mice into those that 
perform similar to mock-infected animals (good learners categorized 
as <8 errors) and those that exhibit severe learning deficits (poor 
learners categorized as >9.5 errors), on day 2 of Barnes maze testing 
(Figs le and 2c-e). A total of 747 genes were increased in the WNV 
poor learners, 45 genes altered in WNV good learners, and 572 genes 
altered in both groups compared to mock-infected littermates (Fig. 2f) 
(Supplementary Tables 2 and 3). WNV-NS5-E218A-recovered animals 
with poor learning exhibited increased levels of CRRY (also known as 
Cr11), Dap12, FeyR2, Rac2, and C1QA and decreased levels of Dlg2, 
a synaptic scaffolding protein, and the metabotropic glutamate recep- 
tor, Grm5, compared with good learners and mock-infected animals 
(Fig. 2g). Grm5 is downregulated in mouse brain during acute WNV, 
Japanese encephalitis virus, and reovirus infections”!. KEGG pathway 
analysis of upregulated genes in WNV poor memory compared to 
WNYV good memory mice revealed top pathways of cytokine signal- 
ling, calcium signalling, and B-cell receptor signalling (Supplementary 
Table 4), whereas top pathways of downregulated genes include long- 
term potentiation, axon guidance, and Wnt signalling (Supplementary 
Table 5). 

Loss in neurons or persistence of virus could contribute to cogni- 
tive dysfunction. Evaluation of neuronal numbers throughout the 
hippocampus and entorhinal cortices of mock-infected and WNV- 
infected good and poor learners revealed no differences (Extended 
Data Fig. 2b). Analyses of WNV envelope protein positive and negative 
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Figure 2 | Transcriptional profile of good and poor spatial learners 
during WNV recovery. a, Heat maps show relative expression of 
significantly altered genes (see Methods) generated from hippocampal 
microarray of mock vs WNV-NS5-E218A-recovered mice at 25 dpi, 

each column represents individual mice. b, Validation of select genes 

and pathways in a unique set of mice by qPCR (mock (n = 5) and 
WNV-NSS5E218A (n= 6) mice). ¢c, Scatter plot depicting number of errors 
committed on day 2 of Barnes maze testing, showing good (blue) and poor 
(green) learners among WNV-NS5-E218A-infected and mock-infected 
(red) controls. d, Principle component analysis of microarray samples 
separated by groups as in c. WNV, West Nile virus. e, Relative expression 


strand RNA from the hippocampi of WNV-NS5-E218A-recovered 
mice showed high levels of both strands at 7 dpi, which decreased by 
25 and 52 dpi, with no differences in levels of either strand between 
good and poor learners at either time point (Extended Data Fig. 1h-j). 
These data suggest that persistence of replicating WNV-NS5-E218A 
does not contribute to alterations in learning. 

Given the alterations in genes related to synaptic function, we quanti- 
fied synaptic terminals within the hippocampus of WNV-NS5-E218A- 
infected mice. Numbers of colocalized presynaptic and postsynaptic 
puncta within the stratum lucidum of the hippocampal CA3 (mossy fibre 
terminals) were decreased at 7 dpi in WNV-NS5-E218A-infected animals 
compared to mock-infected controls (Fig. 3a). The decrease in colocali- 
zation was traced to a 40% reduction in number, but not size, of presyn- 
aptic terminals (Extended Data Fig. 3a), with no change in numbers of 
postsynaptic terminals (Fig. 3a). Altered expression of the presynaptic 
glutamatergic vesicular transporter, VGlut1 (also known as Slc17a7), in 
the hippocampus has been linked to cognitive impairment in rodents”. 
Evidence of hippocampal glutamatergic synapse loss was detectable at 
25 dpi and WNV-NS5-E218A-recovered mice with poor spatial learn- 
ing exhibited fewer VGlut1-positive synaptic puncta than WNV-NS5- 
E218A-infected good learners, which were fewer than in mock-infected 
controls (Fig. 3b). WNV-NY99 infection led to similar reductions in 
synaptic terminals (Extended Data Fig. 3b). Of note, mock-infected and 
WNV-NSS5-E218A-recovered mice (all learners) display similar levels of 
phosphorylated neurofilament heavy chain (SMI-31) within hippocampal 
mossy fibre tracts (Extended Data Fig. 3c). These results indicate that 
axons are preserved despite elimination of synapses, suggesting that 
synapse elimination does not lead to neuronal death. 

Acute post-mortem WNV neuroinvasive disease patient speci- 
mens similarly show reduced numbers of CA3 presynaptic terminals 
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(mock (n= 5), WNV good (n =3), and WNV poor (n= 3) mice). All 
panels, ***P < 0.001, *P < 0.05, NS, not significant by two-tailed t-test. 
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compared to age-matched patient controls (Fig. 3c and Supplementary 
Table 5). In 3 out of 5 WNV cases, WNV antigen was detected 
within the CA2/CA3 region, but absent from neighbouring regions 
(Supplementary Table 6 and Extended Data Fig. 4a). Hippocampal CA1 
and entorhinal cortex in the human samples also displayed presynaptic 
terminal loss (Extended Data Fig. 4b, c). These data show that synaptic 
pathology is observed in areas without detectable viral antigen. Thus, 
WNYV infection may alter synapse homeostasis both in affected and 
connected brain regions. 

Given their phagocytic appearance in WNV-recovered mice, 
we wondered if microglia were driving synapse loss. Three-dimensional 
reconstructions of microglia from CX3CR1-GFP*” WNV-NS5-E218A- 
infected mice (Fig. 3d) revealed synaptophysin-positive puncta within 
GFP* cells (Supplementary Videos 3 and 4). Additionally, presynap- 
tic terminals from mice with fewer and less proliferative microglia 
(11347/-)?? were not eliminated during WNV-NS5-E218A acute infec- 
tion (Fig. 3e). Colocalization of synaptophysin, lysosomal-associated 
membrane protein 1 (LAMP1), and IBA1, revealed increased num- 
bers of presynaptic terminals within the lysosomes of IBA1-positive 
cells, but not $1008* astrocytes (Extended Data Fig. 3d), in the CA3 
of WNV-NS5-E218A-infected mice at 7 and 25 dpi compared with 
mock-infected animals (Fig. 3f). Numbers of engulfed presynaptic 
terminals recover to baseline levels by 52 dpi, suggesting that presyn- 
aptic elimination eventually abates. Electron microscopy of the CA3 
and molecular layer (Fig. 3g) during acute WNV neuroinvasive disease 
identified microglia enriched in phagosomes (Extended Data Fig. 3e) 
and with processes that surround synapses (Fig. 3g). 

Macrophage-mediated phagocytosis in the periphery often requires 
antibody and complement deposition, however, no differences in 
the amount of endogenous mouse IgG coating VGlut1-positive or 
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Homer1-positive, CA3 synaptic terminals at 7 and 25 dpi following 
WNV-NS5-E218A infection were observed compared to mock- 
infected controls (Extended Data Fig. 3f, g). Furthermore, uMT~/ ~ 
mice, which lack mature B cells, eliminate synaptic terminals during 
acute WNV-NS5-E218A infection at similar levels as in wild-type mice 
(Extended Data Fig. 3h). Hippocampal upregulation of C1QA, the 
initiating factor of the classical complement cascade, was detected 
at 7, 25 and 52 dpi compared to mock-infected controls (Fig. 4a). 
Hippocampal Clqa mRNA in conjunction with IBA1 expression 


was detected in mock-infected and WNV-NS5-E218A-infected mice 
at 7 dpi, the latter of which was increased (Fig. 4b). C1QA protein 
was detected in IBA1* cellular processes adjacent to or surrounding 
neurons (Fig. 4b, c). 

C1QA also colocalized within WNV antigen-positive cells with 
neuronal morphology (Fig. 4d) and with Map2-positive neurites 
(Fig. 4e). Increased levels of the complement C3 cleavage product, 
C3d, also colocalized with VGlutl-positive terminals at 7 dpi 
(Fig. 4f). Levels of colocalization of C1QA with VGlut1-positive 
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Figure 4 | Classical complement activation in neurons and microglia 
drives WNV-mediated synaptic terminal elimination. a, qPCR analysis 
of hippocampal Clqa mRNA normalized to Gapdh in mock (n= 8) or 
7 dpi (n =3), 25 dpi (n =8) or 52 dpi (n=7) after WNV-NS5-E218A 
infection. b, Fluorescent in situ hybridization using RNA probes for 
neuron specific enolase (NSE) with sense or antisense C1QA coupled 
with immunostaining for IBA1 in WNV-NS5-E218A-infected or 
control mice with high magnification insets. Images are representative 
of 3 mice per group. Scale bars, 50 jum. c, Immunostaining for C1QA 
protein and IBA1 with high magnification insets. Arrowheads depict 
colocalization. d, Immunostaining for C1QA protein and WNV antigen 
at 7 dpi with a WNV-infected neuron shown in high magnification 
inset. e, Immunostaining for C1QA with neuronal marker, Map2. 

f, Immunostaining shows colocalization of presynaptic marker, 


presynaptic terminals were significantly increased in WNV-NS5- 
E218A-recovered animals compared with mock-infected controls 
at 7 and 25, but not 52 dpi (Fig. 4g). WNV-NS5-E218A-infected 
complement C3~/~ and C3ar1~'-, but not CR1/2~'~ or Itgam~'~ 
(Itgam is also known as Cr3), mice showed no differences in syn- 
aptophysin-positive synaptic terminals compared to mock-infected 
controls (Fig. 4h). Presynaptic puncta within microglial (IBA1*) 
lysosomes were also decreased in WNV-NS5-E218A-infected com- 
plement C3~/~ and C3ar1~/~ compared to wild-type mice (Fig. 4i), 
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g, Immunostaining showing colocalization of C1QA and VGlut1 with 
representative super-resolution micrographs shown. Scale bars, 5 |1m. 

h, Synaptophysin (Syp) immunostaining in WNV-NS5-E218A-infected 
wild-type, complement C3-null, complement receptors C3aR-null, 
CR1/2-null, and CR3-null mice at 7 dpi, normalized to age and genotype- 
matched, mock-infected controls. i, Immunostaining and quantification 
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indicating that WNV-mediated elimination of synaptic terminals 
requires C3 and C3aR. 

Many pathogens and pathogen-associated molecular patterns induce 
complement activation within the central nervous system, including 
WNV“, HIV’®, and amyloid plaques”®. Our study suggests that 
complement C3 and C3aR mediate presynaptic terminal loss in the 
hippocampi of mice that exhibit spatial learning defects during recovery 
from West Nile neuroinvasive disease. Microglia and recognition of C3 
cleavage products by complement receptor C3aR, which is expressed by 
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both neurons and microglia”’, are required for this process. Although 
astrocytes did not exhibit increased colocalization with synaptic 
terminals, we cannot completely rule out their contribution to this 
process, as their rates of lysosomal digestion differs from microglia”. 
Furthermore, alterations to NMDA receptor-mediated long-term 
potentiation could also contribute to altered synapse homeostasis and 
memory as several genes in this pathway are differentially expressed in 
WNV-poor learners (Supplementary Table 4). It is unknown whether 
complement labelled and eliminated terminals are connected to WNV 
infected or healthy neurons. In the context of neurotropic viral infection, 
elimination of presynaptic terminals may prevent trans-synaptic viral 
spread or aberrant signalling of infected neurons. Further studies will 
determine if complement- and microglial-dependent synapse elimina- 
tion prevents neuron-to-neuron spread of WNV and other neurotropic 
viruses. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 
Animals. At the outset of all experiments, 8-10-week-old male and female mice 
were used. C57BL/6] and CD11b~/~ (CR3~’-) mice were obtained from Jackson 
Laboratories. C3~/~, CR1/2~~ (in mice, CR1 and CR2 are splice variants both 
derived from the mouse Cr2 gene), and C3aR~/~ mice (>10 generations back- 
crossed to C57BL/6) were obtained from John Atkinson (Washington University), 
CX3CR1-GEP*” mice (>10 generations backcrossed to C57Bl/6) were obtained 
from Richard Ransohoff (Lerner Research Institute, Cleveland Clinic Foundation), 
\uMT~!~ mice (C57BL/6 background) were obtained from Michael Diamond 
(Washington University), and 1134~/~ mice (C57BL/6 background) were obtained 
from Marco Colonna (Washington University). All mice were randomly assigned 
to control or experimental groups at the beginning of each experiment. All exper- 
imental protocols were performed in compliance with the Washington University 
School of Medicine Animal Studies Committee (protocol number 20140122). 
Mouse models of WNV infection. The WNV strain 3000.0259 was isolated in 
New York and passaged once in C6/36 Aedes albopictus cells to generate an insect- 
cell-derived stock. Then 100 plaque-forming units (pfu) of WNV-NY99 were 
delivered in 50 11 to the footpad of anaesthetized mice. WNV-NS5-E218A, which 
harbours a single point mutation in the 2’ O-methyl-transferase gene, was obtained 
from Michael Diamond (Washington University) and passaged in Vero cells as 
described previously®. Deeply anaesthetized mice were administered with 10‘ pfu 
of WNV-NS5-E218A or 10 pfu of WNV-NY99 in 101] into the brain’s third 
ventricle via a guided 29 gauge needle. 

Stock titres of all viruses were determined by using BHK21 cells for viral plaque 
assay as previously described'4. 
Leukocyte isolation and flow cytometry. Cells were isolated from brains of 
wild-type mice at day 7 post-infection and stained with fluorescently conjugated 
antibodies to CD4, CD8, CD11b, and CD45 as previously described!*. Data 
collection and analysis were performed with an LSRII flow cytometer using FlowJo 
software. 
Behavioural testing. Test for anxiety and locomotor behaviour. The open-field 
test was used to assess baseline differences in anxiety or locomotor behaviour, 
before Barnes maze experiments. A standard Open Field arena (54 x 54cm, 
custom built) was used, consisting of a simple square box with a grid (6 squares per 
side) along the base. Animals were placed in the centre of the arena and allowed 
to explore for 5 min. The arena was decontaminated with 70% ethanol between 
each trial. Locomotor activity was assessed by counting the number of lines the 
animal crossed during the testing period, and anxiety was assessed by counting 
the number of times the animal crossed through the centre of the field. Behaviour 
was recorded using a camera (Canon PowerShot SD1100 IS), and a blinded 
experimenter scored the trial. Any mice that jumped out of the open-field maze 
were excluded from analyses. 
Test for visual spatial memory. The Barnes maze was used to assess visual spatial 
memory. An elevated Barnes maze (91.4cm diameter, custom built) containing 
19 empty holes and 1 target hole with a hidden escape chamber was used for testing 
(5cm diameter holes were evenly spaced around the table, 6.35 cm from the edge). 
Visual cues were placed around the room and remained in the same location during 
the entire testing period. Mice were tested on the Barnes maze over the course of 
5 consecutive days. Each mouse received two trials per day, spaced exactly 30 min 
apart. For each trial, the mouse was placed in the centre of the maze in a covered 
start box for 10s, and removal of the box signalled the start of a trial. Each mouse 
was given 3 min to explore the maze and find the target hole. Mice that did not 
enter the target hole within 3 min were gently guided into the hole. After each trial, 
the mouse remained in the target hole for exactly 1 min, and then was returned to 
its home cage. The maze was decontaminated with 70% ethanol between each trial. 
The numbers of errors (nose pokes over non-target holes) and the latency to find 
the target hole (amount of time elapsed before nose poke over target hole) were 
measured. Behaviour was recorded using a camera (Canon Powershot SD1100IS), 
and a blinded experimenter scored the trials. Any mice which fell off the Barnes 
maze table during any trial were excluded from analyses. No randomization was 
required for these studies. 
Immunohistochemistry. Following perfusion with ice-cold PBS and 4% PFA, 
brains were immersion-fixed overnight in 4% PFA, followed by cryoprotection in 
two exchanges of 30% sucrose for 72h, then frozen in OCT (Fisher). 9}1m-thick 
fixed-frozen coronal brain sections were washed with PBS and permeabilized with 
0.1% Triton X-100 (Sigma-Aldrich), and nonspecific antibody was blocked with 
5-10% normal goat serum (Santa Cruz Biotechnology) for 1h at room temperature. 
Mouse on mouse kit (MOM basic kit, Vector) was used as per the manufacturer’s 
protocol when detecting synaptophysin to reduce endogenous mouse 
antibody staining. After block, slides were exposed to primary antibody or 
isotype matched IgG overnight at 4°C, washed with 1x PBS and incubated with 
secondary antibodies for 1h at room temperature. Nuclei were counterstained with 
TO-PRO-3 (Invitrogen) and coverslips were applied with vectashield (Vector). 


Immunofluorescence was analysed using a Zeiss LSM 510 laser-scanning confocal 
microscope and accompanying software (Zeiss). Positive immunofluorescent 
signals were quantified by a blinded experimenter using the NIH Image analysis 
software, Image]. 

TUNEL staining was performed using the TMR-red in situ cell death detection 

kit (Roche) as per manufacturer’s instructions. C1QA staining was performed as 
previously described”’. 
Antibodies. C1QA (undiluted, described previously”), WNV (1:100, described 
previously"), rat anti-GFAP (1:200, Invitrogen catalogue number 13-0300), rabbit 
anti-IBA1 (1:100, WAKO catalogue number 019-19741), mouse anti-NeuN-biotin 
(1:100, Millipore catalogue number MAB 377B), chicken anti-GFP (1:1,000, 
Abcam catalogue number 13970), rabbit anti-Synapsin1 (1:200, Millipore catalogue 
number Ab1543), mouse anti-synaptophysin (1:50, DAKO catalogue number 
M0776 or 1:50 Abcam catalogue number ab8049), guinea-pig anti- VGlut1 (1:300, 
Synaptic Systems catalogue number 135304), rabbit anti- Homer] (1:200, Synaptic 
Systems 160003), rabbit anti-C3d (1:500, DAKO catalogue number A0063), rabbit 
anti $1008 (1:300, Abcam catalogue number ab52642), rat anti-Lamp1 (1:50, BD 
Pharmingen catalogue number 553792), and rat anti-CD68 (1:200, Serotec cata- 
logue number MCA1957). 

Secondary antibodies conjugated to Alexa-488, Alexa-555, or Alexa-633 
(Invitrogen) were used at 1:400 dilution. 

MRI. Mice were intracardially perfused, first with ice-cold PBS and then with 
a mixture of 4% PFA and 10% Multihance (gadobenate dimeglumine, Bracco 
Diagnostics, Princeton, NJ). Heads were further fixed in 4% PFA for 24h before 
being trimmed of extraneous tissue around the skull (to minimize the field of 
view). Heads were then placed in 1% Multihance in PBS until being imaged 
several days later. Ex vivo, whole-head MR imaging experiments were performed at 
4.7 T using an Agilent/Varian (Santa Clara, CA) DirectDrivel small-animal scan- 
ner. Data were collected with a custom-made RF foil coil that fits tightly around the 
head using a 3D, T1-weighted gradient echo sequence with the following parame- 
ters: TR= 105 ms, TE=6 ms, flip angle = 90°, isotropic resolution = (0.0625 mm)’, 
and scan time ~11h. Regions of interest were manually drawn for the whole brain 
and hippocampus in ITK-SNAP (http://www.itksnap.org) from which volumes 
were calculated. 

Collection and purification of hippocampal RNA. Mice were perfused with 
ice-cold PBS then hippocampi were dissected and snap-frozen in Tri-reagent 
(Ambion). Hippocampal tissue was then homogenized, and RNA purified as 
previously described* using the RNA Ribopure kit (Ambion). RNA was precipitated 
with 25 mM ammonium acetate in 100% ethanol at —80°C overnight, resuspended 
in RNase-free H2O, and checked for purity. RNA was then treated with RNase-out 
and DNase I (invitrogen) as per the manufacturer's protocol. 

Microarray. Hippocampal RNA was isolated as described above and submitted 
to the Washington University Genome Technology Access Center. The total RNA 
quality and quantity were then determined by Agilent 2100 bioanalyzer (Agilent 
Technologies, Santa Clara, CA) and NanoDrop ND-1000 Spectrophotometer 
(Thermo Scientific NanoDrop, Wilmington, DE), according to manufacturer’s 
recommendations, respectively. A total of 400 ng of RNA transcripts from each 
sample were amplified by T7 linear amplification with the MessageAmp TotalPrep 
Amplification kit (Life Technologies-Ambion, Austin, TX). Then 1.5 1g of each 
amplified and biotinylated RNA (aRNA) sample was hybridized onto Illumina 
MouseWG-6 v2 expression beadchips, followed by cy3 steptavidin-based staining, 
washing, and scanning, according to Illumina standard protocol. The iScan 
scanner-created image data were loaded into Illumina GenomeStudio (v2011) 
for generation of expression values and data normalizations. Only those probes 
that were detected at P< 0.05 in at least one of the samples were used in down- 
stream statistical analysis. Background subtracted and quantile normalized data 
were used in statistical analysis for identification of differentially expressed genes 
with one-way ANOVA test using Partek Genomics Suite (v6.6, St. Louis, MO). 
All original P values in the ANOVA analysis were adjusted by q-value based 
multiple test correction*!. KEGG pathway analysis was performed using DAVID 
bioinformatics database functional annotation tool v.6.7 (ref. 32). Microarray 
data has been deposited in the Gene Expression Omnibus (accession number 
GSE72139) 

Real-time quantitative RT-PCR. cDNA was synthesized using random hexamers, 
oligodT15, and MultiScribe reverse transcriptase (Applied Biosystems). A single 
reverse transcription master mix was used to reverse transcribe all samples to 
minimize differences in reverse transcription efficiency. The following conditions 
were used for reverse transcription reactions: 25°C for 10 min, 48°C for 30 min, 
and 95°C for 5 min. 

For all primer sets except for the strand-specific WNV PCR reaction (detailed 
below), PCR reactions were prepared using Power SYBR Green PCR mastermix and 
calculated copies were normalized against copies of the housekeeping gene, Gapdh. 

Primers are listed in Supplementary Table 6. 
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WNYV strand specific real-time RT-PCR. Two-step strand specific RT-PCR was 
performed using GVA and T7 tagged primers during cDNA synthesis as a modifi- 
cation to a procedure previously described™*. First, CDNA was synthesized using the 
High-Capacity cDNA Reverse Transcription Kit (Applied Biosystems) per the man- 
ufacturer’s instruction, with the addition of 1 pmol of the positive-strand primer 
GVA_TxsspE1229 and 1 pmol of the negative strand primer T7_TxsspE1160 to 
the 10,1] reaction mixture. Strand specific cDNA was then amplified in a qPCR 
reaction using primers directed against the strand specific WNV sequence and 
the tag sequence. Each 12.5,11 qPCR reaction mixture contained 6.25 1l TaqMan 
Gene Expression Master mix, 10 pmol tag primer (positive mix: GVA, negative 
mix: T7), 10 pmol strand specific primer (positive mix: TXsspE1160, negative 
mix: TXsspE1229), 2.5 pmol strand specific probe (WNVsspEProbe) and 50 ng 
strand specific CDNA. Thermal cycling was performed using Applied Biosystems 
ViiA 7 Real-Time PCR system with a 384-well block. Copies were calculated 
based on a standard curve generated from purified PCR product from positive 
strand or negative strand reactions, and normalized to amount of Gapdh in each 
sample. 

For the following sequences, the bold sequence represents the tag sequence, 
while the underlined sequence represents the strand specific sequence: 
GVA_TxsspE1229: 5’-TTTGCTAGCTTTAGGACCTACTATATCTACCT 
GGTCAGCACGTTTGTCATTG-3’; T7_TxsspE1160: 5’-GCGTAATACGA 
CTCACTATATCAGCGATCTCTCCACCAAAG-3'; TXsspE1229: 
5'-GGGTCAGCACGTTTGTCATTG-3’; TXsspE1160: 5’-TCAGCGATCT 
CTCCACCAAAG-3’; T7tag: 5’-GCGTAATACGACTCACTATA-3’; GVAtag: 
5'-TTTGCTAGCTT TAGGACCTACTATATCTACCT-3’; WNVsspEProbe: 
FAM-TGCCCGACCATGGGAGAAGCTC-TAMRA. 

Quantification of synaptic terminals. Image] was used to threshold single-plane 
confocal images, draw a region of interest encompassing the CA3 mossy fibres, and 
to quantify the number of synaptophysin or VGlut1* puncta containing between 
0.5 and 25 square 1m? in area. For each mouse, at least 12 images at 63 x mag- 
nification were counted, which were derived from at least 4 fixed-frozen coronal 
sections spaced 501M apart. 

Quantification of synaptic terminal engulfment by IBA1* cells. ImageJ was used 
to threshold single-plane confocal images, draw a region of interest encompassing 
each IBA1* cell within the hippocampal CA3, and to quantify the number of 
synaptophysin*, Lamp1*, IBA1* puncta between 0.2 and 251m’ in area within 
each IBA1* cell. For each mouse, at least 8 images at 63x magnification were 
counted, which were derived from at least 4 fixed-frozen coronal sections spaced 
50|M apart. 

Electron microscopy. Mice were perfused with 4% PFA, the brain was removed, 
and immersion fixed for 24h at room temperature. Tissue was washed in PBS, 
and 100\.m sections were cut from regions of interest using a vibratome. Sections 
were incubated in 0.5% gelatin, 5% horse serum, and 0.01% saponin in PBS for 
5h ona rotator at room temperature. Sections were then incubated for 48h at 4°C 
on a rotator with rabbit anti-IBA1 (1:600; Wako). After washing in dPBS, sections 
were incubated overnight at 4°C on rotator with donkey anti-rabbit biotinylated 
secondary antibody, (1:500; Rockland) in 0.5% gelatin and 5% horse serum in 
dPBS. Sections were again washed, and then incubated with streptavidin- HRP 
(1:1000; Rockland, S000-03) for 3h at room temperature, followed by another 
wash. HRP was visualized using the DAB Substrate Kit (Cell Marque,957D) for 
5 min, washed and then fixed with 2% PFA, 2.5% glutaraldehyde, in PBS for 30 min, 
followed by a wash in PBS. Sections were post-fixed in 1% osmium tetroxide in PBS 
for 30 min at room temperature, washed in PBS, and then dehydrated in sequential 
concentrations of ethanol for 30 min each. Sections were then infiltrated with 
1:1 Spurr’s resin and 100% ethanol overnight on a rotator, followed by two changes in 
pure Spurr’s resin over 24h. Sections were embedded with Aclar film (EMS, 50425), 
and polymerized at 60°C for 48h. The hippocampus was trimmed from the polym- 
erized section, glued to a previously prepared block of Spurr’s resin, and allowed 
to cure for at least 24h. Using a Diatome ultra 45° diamond knife and a LEICA 
Ultracut UC7, blocks were sectioned at 500 nm to confirm location of the tissue and 
the positivity of antibodies. Once confirmed, 90 nm sections were cut and picked 
up onto 200 hex mesh, formvar-carbon coated copper grids (Ted Pella, 01800-F). 
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Images were digitally captured using a JEOL 1200 EX II transmission electron 
microscope with AMT digital camera. 

Three-dimensional reconstruction of confocal z-stack images. Confocal z-stack 
images taken with a Zeiss LSM 510 META microscope at 63 x magnification, 
consisting of at least 10 images were transformed into 3D reconstruction videos 
using Volocity 3D image analysis software (PerkinElmer). 

Super-resolution microscopy of presynaptic terminals. Sections were imaged 
using a Zeiss ELYRA PS1 microscope. 0.101 jum optical slices of the CA3 region of 
the hippocampus were captured and images subsequently processed using Zeiss 
SIM algorithms to generate structured illumination images. Zeiss Zen software 
was used to generate orthogonal viewpoints showing colocalization of VGlutl and 
Clq in the x, y and z planes. 

Human tissue. Human autopsy hippocampal tissue embedded in paraffin 
was obtained from St. Louis University Medical Center (St. Louis, MO) and 
Presbyterian / St. Luke’s Medical Center (Denver, CO). Sections were first depa- 
raffinized and then boiled in 10 mM sodium citrate buffer for 30 min, for antigen 
retrieval before staining with anti-synaptophysin antibody (DAKO, 1:50). 

In situ hybridization. Fluorescent in situ hybridization was performed on 
9\.m coronal brain sections that were 4% PFA-fixed and frozen. C1QA and 
NSE were used as double mRNA staining and IBA1 as immune-staining. ClqA 
and NSE anti-sense RNA was made and labelled with cyanine and fluorescein, 
respectively, using an RNA labelling kit (Roche). C1QA was amplified by PCR 
from pCMV SPORT6 C1qA plasmid (Openbiosystem MMM1013-63584), 
using forward 5’-GGCATCCGGACTGGTATCCGAGG-3’ and reverse 5’ 
-GGTAAATGCGACCCTTTGCGGGG-3’ primers, which was digested with Sall 
and transcribed with T7 promoter. The RNA probes were incubated overnight at 
64°C, and then detected with antibodies against fluorescein and cyanine (Roche). 
The staining reaction was then amplified with a TSA staining kit (PerkinElmer). 
A rabbit anti-IBA1 (Wako) antibody was used to label macrophages and microglia 
and detected with a donkey anti-rabbit 647 antibody (Life technologies). Stained 
sections were then imaged and analysed on a Zeiss AX10 fluorescent microscope. 
Statistical analysis. To determine mouse group sizes for virological or immunological 
studies, power analysis was performed using the following values: probability of 
type I error = 0.05, power = 80%, fivefold hypothetical difference in mean, and 
population variance of 25-fold (virological studies) or 12-fold (immunological 
studies). Results from Barnes maze spatial learning and memory studies were com- 
pared by repeated measures two-way ANOVA. Microarray data was analysed by 
one-way ANOVA and fold change greater than 1.5, false discovery rate q < 0.05 
to correct for multiple hypotheses for mock vs WNV-all comparison, P < 0.05 
for WNV good learners vs WNV poor learners. Variance between groups was 
equivalent except for cases noted within figure legends and the comparison for 
number of presynaptic terminals in Fig. 3a and the comparisons at 7 and 25 dpi in 
Fig. 3f, in which a Welch's correction on the two-tailed t-test was used to correct for 
unequal variance. All other experiments were compared by Student's two-tailed 
t-test, with *P < 0.05 considered significant. Power calculations using results 
observed in a pilot study in which WNV-recovered mice exhibit a two-fold increase 
in peak errors compared with mock-infected animals indicate that at least 15 recov- 
ered mice per group will be required to obtain statistical significance (P < 0.05) 
on Barnes maze testing. 
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Extended Data Figure 1 | See next page for caption. 
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Extended Data Figure 1 | Murine intracranial infection with attenuated 
WNV-NS5-E218A induces similar viral loads and inflammatory 
response as wild-type WNV-NY99, but greater overall survival. 

a, Plaque assay for infectious virus (measured in plaque-forming units 
per g of tissue) performed on dissected brain tissue at various days post- 
infection with either footpad infection with 10° pfu of WNV-NY99 or 
intracranial infection with 10* pfu of WNV-NS5-E218A. Each point 
represents an individual mouse. b, Survival curves of mice infected 

at 8-weeks-old by the footpad with WNV-NY99 or intracranially 

with WNV-NY99 or WNV-NSS5-E218A. c, Flow cytometric analysis 

of dissected cortex, hippocampus and cerebellum at 6 dpi with 
WNV-NY99 and WNV-NS5-E218A with plots for CD45 and CD11b. 

d, Quantification of flow cytometry data from c. Shown are numbers of 
leukocytes (CD45"8"), lymphocytes (CD45"85, CD11b!™), and activated 
macrophages and microglia (CD45"", CD11b®8*) compared to mock- 
infected controls (n = 4 mice per group). e, Immunostaining and counts 
for TUNEL staining for apoptotic cells with co-staining for the neuronal 
marker, NeuN, during peak infection (7 dpi) of WNV-NS5-E218A (n=5) 
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compared to mock-infected controls (n = 4). DG, dentate gyrus, CTX, 
entorhinal, perirhinal, and visual cortex. f, Some mice were tested at 22 dpi 
on a three-day version of the Barnes maze, and evaluated for latency to 
find target hole (*P < 0.05 by repeated measures two-way ANOVA). 

g, Prior to Barnes maze testing, mice were tested on open field for 

total lines crossed in 2 min at 21 dpi. h, qPCR for positive strand 
(non-replicating strand) and negative strand (replicating) WNV envelope 
protein message remaining in hippocampal tissue at 7, 25 and 52 dpi 

(n= 13, 4, and 14 mice per group for 7, 25, and 52 dpi, respectively), 
measured in copies per Gapdh. i, qPCR for positive strand WNV envelope 
protein at 52 dpi in WNV good learners (fewer than 8 errors on day 2 of 
Barnes maze, nm =5) and WNV poor learners (greater than 9.5 errors on 
day 2 of Barnes maze, n = 9). j, qPCR for negative strand WNV envelope 
protein at 52 dpiin WNV good learners (fewer than 8 errors on day 2 of 
Barnes maze, n= 5) and WNV poor learners (greater than 9.5 errors on 
day 2 of Barnes maze, n= 9). Result was not significant by Student’s 
two-tailed t-test. 
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Extended Data Figure 2 | See next page for caption. 
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Extended Data Figure 2 | At 25-52 days post-WNV-NS5-E218A 
infection, mice do not show any appreciable loss in brain volume, 
neuron or astrocyte numbers, or macrophage infiltration. 

a, Immunostaining for the neuronal marker, NeuN, with TUNEL 

staining for apoptotic cells within the hippocampus at 52 dpi. 
Quantification of the number of TUNEL* neurons and total TUNEL" cells 
is shown in mock (n=3) and WNV-NS5-E218A (n=6). Scale bar, 201m. 

b, Immunostaining and quantification of the number of NeuN* neurons 
per mm? within the CA1, CA3, dentate gyrus and entorhinal cortex at 

25 days after mock (n = 4) or WNV-NS5-E218A infection. WNV-infected 
animals were subdivided into good (n=5) and poor (n= 3) learners. Scale 
bar, 100 1m. c, Post-mortem mouse brains were imaged by MRI at 52 dpi 
to determine tissue volume of the hippocampus (outlined in red) and total 
brain (n =5 mice per group). Scale bar, 1 mm. Not significant by Student's 
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two-tailed t-test (P< 0.05 considered significant). d, Immunostaining 

for the reactive astrocyte marker, GFAP, shows that WNV-NS5-E218A- 
infected mice do not exhibit greater hippocampal astrocyte activation than 
mock-infected controls at 52 dpi. NS, not significant by Student’s two- 
tailed t-test. e, Haematoxylin and eosin (H&E) staining was performed 

at 52 dpi in WNV-NS5-E218A-recovered and mock-recovered mice. 
Occasional microglial nodules (arrowhead) surrounded by lymphocytes 
were observed within the hippocampus. CA1 pyr, CA1 pyramidal layer. 

f, Flow cytometric analysis of whole brain from mock and WNV-NS5- 
E218A-infected mice at 8 and 25 dpi was performed to determine numbers 
of microglia (CD45"", CD11b’”), macrophages (CD45™8", CD11b"8"), 
and lymphocytes (CD45, CD11b"°8*"¥¢), Note the decrease in 
macrophage population from 7 to 25 dpi. 
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Extended Data Figure 3 | See next page for caption. 
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Extended Data Figure 3 | Despite synaptic terminal loss, no changes 
to synaptic terminal size, axons, or astrocyte or antibody association 
with terminals during WNV infection. a, Immunostaining for the 
presynaptic marker, synaptophysin, at 7 dpi comparing mock (n= 7) 
with WNV-NS5-E218A-infected (n = 5) mice. Quantification of 
synaptophysin* puncta size was performed within the hippocampal 
CA3. Scale bar, 10 1m. b, Immunostaining for the presynaptic marker, 
synapsin1, within the hippocampal CA3 in uninfected controls (n =3) 
and footpad-infected WNV-NY-1999 (n= 4) at 8 dpi. Quantification was 
performed on the numbers of synapsin1* puncta per mm? with *P < 0.05 
considered significant. c, Immunostaining within the hippocampal CA3 
for SMI-31, which detect phosphorylated neurofilament and marks axons 
at 25 dpi (n = 5-6 mice per group). Quantification of the area of SMI-31 
per mm? (not significant by Student's t-test). d, Immunostaining within 
the hippocampal CA3 for the presynaptic marker, synaptophysin, 
co-labelled with the astrocyte marker, $1008 at 7 dpi (n =3 mice per 
group). Quantification of the percentage of total $100B* area and 
synaptophysin* area colocalized with $1008 (not significant by Student's 
t-test). e, Electron microscopy was performed on hippocampal CA3 
sections from day 7 after mock (left panel) or WNV-NS5-E218A 

(right panels) infection, with immune-DAB enhancement of IBA1. 
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Note the presence of many phagosomes and cytoplasmic inclusions 
within the WNV-E218A-infected microglia. Electron micrographs 
shown are representative of n =3 mice per group. Scale bars, 1m. 

f, Immunostaining for the presynaptic marker, VGlut1, and endogenous 
murine IgG (mlIgG) at 7 days after mock (n = 4) or WNV-NS5-E218A 
(n= 4) infection. Quantification was performed on the total per cent 

of mlgG staining area as well as the per cent of VGlut1* staining area 
colocalized with mlgG. g, Immunostaining for the postsynaptic marker, 
Homer1, and endogenous mlgG at 25 days after mock (n= 4) or WNV- 
NS5-E218A-infection, which were divided into WNV-infected mice 
which made fewer than 8 errors on day 2 of the Barnes maze (WNV good 
learners, n=5) and WNV-infected mice which made greater than 9.5 
errors on day 2 of the Barnes maze testing (WNV poor learners, n= 3). 
Quantification was performed on the total per cent of mIgG staining 
area as well as the percent of Homer1* staining area colocalized with 
mlgG. Significance was determined by Student's two-tailed t-test with 
P<0.05 considered as significant. NS, not significant. h, Immunostaining 
and quantification of number of VGlut1 hippocampal CA3 presynaptic 
terminals at 7 dpi in wild-type and uMT-/~ mice. (*P < 0.05, NS, not 
significant, by Student’s two-tailed t-test). Scale bars, 10 pm. 
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Extended Data Figure 4 | WNV infection of human hippocampal CA2/ 
CA3 neurons with loss of synapses within the hippocampal CA1 

and the entorhinal cortex. a, Immunostaining of haman WNV 
encephalitis and control post-mortem hippocampal tissue for WNV- 
antigen. Shown at high magnification are neuron cell bodies (arrows) 

and neurites (arrowheads) within the hippocampal CA2/CA3 region. 

b, c, Immunostaining within the hippocampal CA1 (b) or entorhinal 
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cortex (c) for the presynaptic marker, synaptophysin, within human 
WNYV encephalitis and control autopsy cases. Quantification of the per 
cent of synaptophysin* area (hippocampal CA1 P= 0.3, entorhinal cortex 
P=0.11 by two-tailed Student's t-test (not significant). Scale bar, 20,1m. In 
one WNV encephalitis patient sample, the entorhinal cortex could not be 
quantified because it was missing from the section. 
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The bacteriophage 29 tail possesses a pore-forming 
loop for cell membrane penetration 


Jingwei Xu!, Miao Gui!, Dianhong Wang! & Ye Xiang! 


Most bacteriophages are tailed bacteriophages with an isometric or 
a prolate head attached to a long contractile, long non-contractile, 
or short non-contractile tail’. The tail is a complex machine that 
plays a central role in host cell recognition and attachment, cell 
wall and membrane penetration, and viral genome ejection. The 
mechanisms involved in the penetration of the inner host cell 
membrane by bacteriophage tails are not well understood. Here we 
describe structural and functional studies of the bacteriophage 629 
tail knob protein gene product 9 (gp9). The 2.0 A crystal structure of 
gp9 shows that six gp9 molecules form a hexameric tube structure 
with six flexible hydrophobic loops blocking one end of the tube 
before DNA ejection. Sequence and structural analyses suggest that 
the loops in the tube could be membrane active. Further biochemical 
assays and electron microscopy structural analyses show that the 
six hydrophobic loops in the tube exit upon DNA ejection and form 
a channel that spans the lipid bilayer of the membrane and allows 
the release of the bacteriophage genomic DNA, suggesting that 
cell membrane penetration involves a pore-forming mechanism 
similar to that of certain non-enveloped eukaryotic viruses**. 
A search of other phage tail proteins identified similar hydrophobic 
loops, which indicates that a common mechanism might be used 
for membrane penetration by prokaryotic viruses. These findings 
suggest that although prokaryotic and eukaryotic viruses use 
apparently very different mechanisms for infection, they have 
evolved similar mechanisms for breaching the cell membrane. 

Bacteriophage 29 infects Gram-positive Bacillus subtilis cells via a 
short non-contractile tail. The 29 double-stranded (ds)DNA genome 
is encapsulated in a prolate capsid head*~’. Both of the 5’ ends of the 
DNA genome are covalently connected to the gene product 3 (gp3) 
protein®®. The ¢29 tail is characterized by 12 tail spikes or appendages 
that hang around the approximately 380 A-long tail tube”!”. The tail 
tube has a gp11 protein assembly at its proximal end and a gp9 and gp13 
protein assembly at its distal end'!. The gp11 protein assembly, also 
known as the ‘lower collar; has a thin tube and a bulge that is attached 
to the head. The lower collar tube is filled with the terminal protein gp3 
and part of the genomic DNA”. The gp9 and gp13 protein assembly, 
also known as the tail ‘knob; is a cylindrical tube whose distal end is 
blocked when DNA ejection has not been triggered. The tail spikes 
function to recognize and digest host cell wall teichoic acids and anchor 
the phage particle onto the host cell surface!*. Gp13 is a dual-function 
enzyme that specifically degrades the cell wall peptidoglycan and is 
probably the tail protein that helps the phage tail to penetrate the thick 
peptidoglycan layer'?!?!*, The $29 tail must also penetrate the cell 
membrane in order to deliver the gp3—genomic dsDNA complex into 
the host cell cytoplasm. The mechanisms involved in membrane pen- 
etration by the 29 tail and many other phage tails are unknown. 

The gp9 assembly, which is located at the tip of the tail and con- 
stitutes most of the tail knob, might be associated with host cell 
membrane penetration by the tail. We determined the crystal structures 
of full-length gp9 and a mutant construct, gp9A417-491, in which a 


disordered region (residues 417-491) was deleted (Fig. 1a, Methods, 
Extended Data Fig. 1 and Extended Data Tables 1, 2). The gp9A417-491 
structure is a cylindrical tube-like homo-hexamer. The longest dimen- 
sion of the tube is approximately 125 A. The tube has an outer diameter 
of approximately 90 A and an inner diameter of approximately 40 A. 
The wall of the tube is approximately 25 A thick and comprises largely 
6-sheets (Fig. 1b, c). The polypeptide chain starts from one end of the 
tube and ends around the middle of the tube. The N-terminal roughly 
110 residues form a small 3-barrel structure (N-$-barrel domain) that 
is frequently observed in other phage tail proteins or tube-forming 
proteins, functioning as an adaptor to mediate the interactions between 
tail- and tube-forming proteins'>. The polypeptide chain after the N-8 
-barrel domain extends to the other end of the tube and folds into 
another 3-domain (tip 8-domain). A three-strand B-sheet of the tip 
8-domain bulges from the outer surface of the tube and creates features 
similar to the distal end of the tail knob. The remaining C-terminal 
region of the polypeptide chain constitutes the central part of the tube, 
which contains an a/$-domain adjacent to the N-G-barrel domain 
and a 3-domain (middle 3-domain) in which the 6-strands are almost 
parallel with the tube axis (Fig. 1b, c). The structure of full-length gp9 
contains a similar cylindrical hexamer, in which the deleted long loop 
(residues 417-491, L loop) protrudes from the inner wall near the tip 
6-domain and fills approximately two-thirds of the tube (Fig. 1c). The 
L loop contains several short helices and is rich in hydrophobic 
residues. The inner wall surface of the tube is negatively charged 
(Extended Data Fig. 2a). However, the surface of the L loop inside the 
tube is largely hydrophobic (Extended Data Fig. 2b). 

Fitting of the gp9 hexameric structure into a cryo-electron micros- 
copy (cryo-EM) map of the tail (EMDB entry 1420) unambiguously 
places the tip 6-domains at the distal end of the tail and the N-3-barrel 
domains in contact with the lower collar protein assembly (a correla- 
tion coefficient score of 0.673 versus 0.637 for an upside-down fitting 
using the program Situs!®; Extended Data Fig. 3). The fitting result 
suggests that, like the 8-barrel domain in other phage tails, the N-6 
-barrel domain of gp9 may also function as an adaptor in mediating 
interactions with the lower collar protein assembly. The fitted gp9 
assembly accounts for approximately three-quarters of the tail knob 
density. The top one-quarter, uninterpreted density of the tail knob is 
probably a 8-barrel structure as well and should be part of the lower 
collar protein assembly. Consistent with the features of the electron 
microscopy structure, the fitted gp9 assembly structure has its distal 
end blocked by the interior L loops, leaving the proximal end empty. 

The L loop in the tail tube connects two anti-parallel 3-strands 
of the middle 8-domain and has four short helices (H1—H4) and a 
flexible region (residues 437-457) that is highly hydrophobic and is 
not visible in the final structure (Fig. 1c). The polypeptide chain of the 
L loop extends along the tube axis for approximately 85 A and then 
turns 180° near the top of the a/8-domain, leaning against the inner 
tube wall. The chain falls back for approximately 70 A and finally joins 
the middle 8-domain immediately above the tip B-domain (Fig. 1). 


1Centre for Infectious Diseases Research, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Beijing Advanced Innovation Center for Structural Biology, 
Department of Basic Medical Sciences, School of Medicine, Tsinghua University, Beijing 100084, China. 
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Figure 1 | Overall structure of gp9. a, Diagram showing the polypeptide 
chains of full-length gp9 and gp9A417-491. The structural domains of 
gp9 are shown in orange (N-(-barrel), blue (tip 6-barrel), grey (middle 
8-domain), purple (a/8-domain) and red (L loop). b, ¢, Left, ribbon 
diagram of structures of gp9A417-491 (b) and full-length gp9 (c) 
monomers. The structural domains are coloured as in a. The four short 


The six H1 helices near the distal end of the tube form a helix barrel 
around the tube axis. The missing flexible region is approximately 21 A 
long and is located near the tube axis close to where the loop turns. The 
Lloops fill most of the inner space of the tube, leaving a channel that is 
about 6.1 A wide near the tube axis. The narrow channel is obviously 
too small for the transportation of dsDNA, which has a gyration radius 
of approximately 20 A. However, as shown in the gp9A417-491 struc- 
ture, the tube without the L loop has an inner channel that is sufficiently 
large (about 40 A in diameter) for dsDNA ejection, which suggests that 
the L loop must exit from the tube before or simultaneously with DNA 
ejection. A search’” for homologous sequences of the L loop shows that 
the flexible region of the L loop presents hydrophobic features and a 
sequence pattern (Fig. 2) that are similar to those of the type I fusion 
peptides used by certain enveloped viruses, such as HIV and influenza 
viruses. Hydrophobic membrane active peptides that are essential for 
host cell entry have also been found in non-enveloped viruses!*. The 
search results suggest that the L loop may be membrane active and play 
a role in membrane penetration. 

We found that the DNA ejection of bacteriophage 29 could be 
induced by low-pH buffers at around pH 4.0. Under low-pH condi- 
tions, the DNA-emptied particles tend to aggregate through the tail 
tips (Fig. 3a). However, the particles are dispersed in the presence 
of 1% (v/v) Triton X-100 (Fig. 3b), which suggests that the newly 
exposed surface after DNA ejection is hydrophobic. Cryo-EM stud- 
ies of the low-pH-treated, Triton X-100-stabilized and DNA-emptied 
particles show an additional cone-shaped structure at the distal tip 
of the tail knob. The cone shape density extends the tail for approxi- 
mately 40 A and is connected to the tail knob through a narrow neck 
(Fig. 4a, b). A similar cone-shaped structure was observed in previous 


410 507 
a EL Hs, 
416 492 
a ET) 4,0 


| o/B-domain 


Middle B-domain 


go° 


90° 


helices in the L loop structure are labelled H1-4. Middle, ribbon diagram 
of hexamer structures of go9A417-491 (b) and full-length gp9 (c) with the 
L loops coloured red. The positions of the six monomers are indicated by 
numbers. Right, ribbon diagram of a central section of the gp9A417-491 
(b) and full-length gp9 (c) hexamer structures with the L loops coloured 
red. The disordered regions of the L loops are represented by dashed lines. 


cryo-EM studies of sodium-perchlorate-treated DNA-emptied phage 
particles’; this structure had an inner channel for dsDNA release 
and was not clearly assigned to any of the tail proteins. The post- 
DNA- ejection structures of the tail knob and the biochemical data 
suggest that the hydrophobic L loop exits to form the cone-shaped 
structure when DNA ejection is triggered. To test further the pos- 
sible membrane penetration function of the L loop, we premixed 
mature phage particles with liposomes containing phospholipids 
similar to those in the B. subtilis cell membrane’’. The phage par- 
ticles were full and dispersed in the neutral pH liposome solution 
(Extended Data Fig. 4a). However, when DNA ejection was triggered 
using low-pH buffers, cryo-EM examination of the phage-liposome 
mixture showed that the DNA-emptied particles were able to stand 
on the liposome surface with their tail tips in contact with the mem- 
brane (Fig. 3c), which is similar to phage particles on an infected host 
cell surface”. Dense DNA densities are observed within the lipos- 
omes, suggesting that a large amount of the genomic DNA has been 


Motif GHHHXGHHGHHGG 

1 10 20 30 
HIV gp41 1 faVGIGAMLFiL@aal Hae see ee RQ. . .fLSfq. 36 
29 gp9 4291. . . SS[TMMIN[e]T MEM Tess GiI|SENGA S Ala\c G|SENL GMA S Sly TleM 465 
629 gp9 QQQQD 
HIV gp41 


Figure 2 | Sequence alignment of the gp9 L loop and the HIV fusion 
peptide. Secondary structures are shown below the alignments. 
Completely conserved residues are shown in white on a red background. 
Conserved residues are boxed. The consensus sequence is represented by 
H for hydrophobic, G for glycine or serine and X for any residue. The core 
region of the HIV fusion peptide is indicated by a red dashed box. 
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Low pH plus 1% (v/v) 
Triton X-100 


Low pH 


Low pH plus liposomes 


Figure 3 | EM images of post DNA-ejection 29 particles. a, Negatively 
stained image of low-pH-treated 029 particles showing aggregation of 
most particles via the tail tips. b, Negatively stained image of low-pH- 
treated, Triton-X-100-stabilized 29 particles showing dispersion of the 
particles. c, A cryo-EM image showing low-pH-treated 29 particles 
standing on liposomes. Dense dsDNA strands can be observed inside 
the liposomes. Left, 29 particle and liposome complexes in a grid hole 
(~1.2 1m in diameter); right, enlarged view of a liposome-29 particle 
complex (boxed area in left-hand image). Black arrowhead, boundary 
of the liposome membrane; red arrowhead, dsDNA strand within the 
liposome. Scale bars, 200 nm. 


ejected into the liposomes. Single-particle reconstructions of the 
particles on the liposomes show a similar cone-shaped structure at 
the distal end of the tail. The cone-shaped structure is embedded in 


NaClO,-treated 


Mature 29 DNA-emptied 


Low-pH-treated 
CsCl gradient-purified CsCl gradient-purified | Low-pH-treated 
DNA-emptied 


the membrane and spans the lipid bilayer (Fig. 4). Notably, the mem- 
brane is dented around the position where the cone-shaped structure 
is embedded. Compared with the sodium-perchlorate-treated parti- 
cles, a small portion of the genomic dsDNA remains in the heads 
of the low-pH-treated particles. The low-pH-treated cone-shaped 
structures lack an obvious inner channel, probably because they are 
filled with the dsDNA genome. Further purification of the low-pH- 
treated particles with an isopycnic CsCl gradient completely removed 
the genomic dsDNA from the emptied particles. A cryo-EM recon- 
struction of the low-pH-treated and CsCl-gradient-purified particles 
was calculated and showed a cone-shaped structure with an inner 
channel at the distal end of the tail (Fig. 4). The pore-forming ability 
of the L loop was confirmed using a dye release assay (Methods 
and Extended Data Fig. 5). Structural modelling of the L loop con- 
formation in the cone-shaped structure using RosettaCM”! shows 
that the flexible region and H2 form the wall of the cone-shaped 
structure (Extended Data Fig. 6). Consistent with the experimental 
data, the modelled structure also suggests that the outer surface of 
the cone-shaped structure is highly hydrophobic. The height of the 
cone-shaped structure is approximately the same as the thickness 
of a lipid bilayer. Notably, we found that the gp9 L loop could also 
penetrate a liposome membrane containing phospholipids similar 
to those in the eukaryotic cell membrane (Extended Data Fig. 4b). 
Blast searches for similar loops in phage tail proteins suggested that 
this mechanism might be used for cell membrane penetration by 
many phages (Extended Data Fig. 7), including bacteriophage T4, 
which has a long contractile tail. 

Viruses must breach the physical membrane barrier to deliver their 
genome into the host cell cytoplasm. Most enveloped eukaryotic 
viruses insert a hydrophobic fusion peptide into the host membrane 
by using conformational changes in the fusion-peptide-related 
structure to mediate the fusion of the cell membrane and the viral 
membrane, thus overcoming the host membrane barrier. Non- 
enveloped eukaryotic viruses use a membrane-active peptide to 
penetrate membranes through a pore-forming or a local cellular 
membrane disruption mechanism!*. Our identification of a hydro- 
phobic, pore-forming, membrane-active peptide in a prokaryotic 
virus suggests that prokaryotic and eukaryotic viruses share common 
mechanisms for membrane penetration, possibly as a result of 
convergent evolution. 


Low-pH-treated 
DNA-emptied with 


DNA-emptied liposomes 


Figure 4 | Pre- and post-DNA-ejection structures of the 29 tail. 
Comparison of the cryo-EM densities of DNA-filled mature, NaClOg- 
treated and CsCl-gradient-purified, low-pH-treated and CsCl-gradient- 
purified, low-pH-treated and low-pH-treated with liposomes tails. The 
DNA-emptied particles have an additional cone-shaped density at their 
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distal ends. a, Surface-shaded view. b, Cross-section of the cryo-EM 
densities. The lipid bilayer has weaker densities than that of the phage tail 
and is not visible at a contouring level of 3.00. The boundary of the lipid 
bilayer is indicated with dashed lines. 


© 2016 Macmillan Publishers Limited. All rights reserved 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 


Received 12 October 2015; accepted 14 April 2016. 
Published online 15 June 2016. 


1. Ackermann, H. W. Bacteriophage observations and evolution. Res. Microbiol. 
154, 245-251 (2003). 

2. Panjwani, A. et a/. Capsid protein VP4 of human rhinovirus induces membrane 
permeability by the formation of a size-selective multimeric pore. PLoS Pathog. 
10, e€1004294 (2014). 

3. Shukla, A., Padhi, A. K., Gomes, J. & Banerjee, M. The VP4 peptide of hepatitis A 
virus ruptures membranes through formation of discrete pores. J. Virol. 88, 
12409-12421 (2014). 

4. Galloux, M. et a/. NMR structure of a viral peptide inserted in artificial 
membranes: a view on the early steps of the birnavirus entry process. J. Biol. 
Chem. 285, 19409-19421 (2010). 

5. Tao, Y. et al. Assembly of a tailed bacterial virus and its genome release studied 
in three dimensions. Cel! 95, 431-437 (1998). 

6. Anderson, D. L. & Reilly, B. E. in Bacillus subtilis and Other Gram-Positive 
Bacteria: Biochemistry, Physiology, and Molecular Genetics (eds Sonenshein, 
A.L, Hoch, J. A. & Losick, R.) 859-867 (American Society for Microbiology, 1993). 

7. Morais, M. C. et al. Conservation of the capsid structure in tailed dsDNA 
bacteriophages: the pseudoatomic structure of 629. Mol. Cell 18, 149-159 (2005). 

8. Meijer, W. J., Horcajadas, J. A. & Salas, M. 629 family of phages. Microbiol. Mol. 
Biol. Rev. 65, 261-287 (2001). 

9. Xiang, Y. et al. Structural changes of bacteriophage #29 upon DNA packaging 

and release. EMBO J. 25, 5229-5239 (2006). 

0. Tang, J. et al. DNA poised for release in bacteriophage 29. Structure 16, 
935-943 (2008). 

1. Xiang, Y. et a/. Crystal and cryoEM structural studies of a cell wall degrading 
enzyme in the bacteriophage 629 tail. Proc. Nat! Acad. Sci. USA 105, 
9552-9557 (2008). 

2. Xiang, Y. et al. Crystallographic insights into the autocatalytic assembly 
mechanism of a bacteriophage tail spike. Mol. Cell 34, 375-386 (2009). 

3. Cohen, D. N. et al. Shared catalysis in virus entry and bacterial cell wall 
depolymerization. J. Mol. Biol. 387, 607-618 (2009). 

4. Cohen, D. N., Erickson, S. E., Xiang, Y., Rossmann, M. G. & Anderson, D. L. 
Multifunctional roles of a bacteriophage 629 morphogenetic factor in 
assembly and infection. J. Mol. Biol. 378, 8304-817 (2008). 


LETTER 


15. Cardarelli, L. et al. Phages have adapted the same protein fold to fulfill multiple 
functions in virion assembly. Proc. Nat! Acad. Sci. USA 107, 14384-14389 
(2010). 

16. Wriggers, W. Using Situs for the integration of multi-resolution structures. 
Biophys. Rev. 2, 21-27 (2010). 

17. Remmert, M., Biegert, A., Hauser, A. & Soding, J. HHblits: lightning-fast iterative 
protein sequence searching by HMM-HMM alignment. Nat. Methods 9, 
173-175 (2011). 

18. Johnson, J. E. & Vogt, P. K. Cell entry by non-enveloped viruses. Curr. Top. 
Microbiol. Immunol. 343, v-vii (2010). 

19. Doan, T. et a/. FisB mediates membrane fission during sporulation in Bacillus 
subtilis. Genes Dev. 27, 322-334 (2013). 

20. Xiang, Y. & Rossmann, M. G. Structure of bacteriophage 29 head fibers has a 
supercoiled triple repeating helix-turn-helix motif. Proc. Natl Acad. Sci. USA 
108, 4806-4810 (2011). 

21. Song, Y. et al. High-resolution comparative modeling with RosettaCM. Structure 
21, 1735-1742 (2013). 


Acknowledgements We thank L. Q. Zhang, N. Yan, H. T. Li, S. L. Fan, N. Gao, C. Z. 
Zhou, D. L. Anderson and M. G. Rossmann for support; the Tsinghua University 
Branch of the China National Center for Protein Sciences for the facility support; 
and the staff at the Shanghai Synchrotron Research Facility beam line BL17U 
for assistance with data collection. This work was supported by funds from 

the 973 program (2015CB910102), the National Natural Science Foundation 
of China (31470721 and 81550001), the Junior Thousand Talents Program 

of China (20131770418) and the Beijing Advanced Innovation Center for 
Structural Biology to Y.X. 


Author Contributions J.X. and Y.X. designed the research; J.X., M.G., D.W. and 
Y.X. performed the experiments; J.X. and Y.X. analysed the data and wrote the 
paper; and all authors contributed to the editing of the manuscript. 


Author Information The atomic coordinates and structure factor files have 
been deposited into the Protein Data Bank (PDB) under accession numbers 
5FB4, 5FB5 and 5FEl. The electron microscopy maps have been deposited 
into the Electron Microscopy Data Bank (EMDB) under accession numbers 
EMD-6556, EMD-6557 and EMD-6558. Reprints and permissions information 
is available at www.nature.com/reprints. The authors declare no competing 
financial interests. Readers are welcome to comment on the online version of 
the paper. Correspondence and requests for materials should be addressed to 
Y.X. (yxiang@mail.tsinghua.edu.cn). 


23 JUNE 2016 | VOL 534 | NATURE | 547 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


METHODS 


Protein expression and purification. Full-length gp9 consists of 599 residues. 
It took approximately half a year for the full-length protein crystals to grow. An 
SDS-PAGE gel analysis of the protein in the crystallization wells showed that most 
of the full-length protein was degraded to form several smaller fragments. A mass 
spectrometry analysis of the fragments indicated two cleavage sites at residues 
398 and 501. Additionally, secondary structure and disordered region predictions 
showed that residues 414-499 did not form any secondary structure and were 
disordered (Extended Data Fig. 1). A series of constructs were produced to delete the 
disordered region. Among the tested constructs, construct gp9A417-491 (Fig. 1a), 
which had the best expression level in Escherichia coli cells, was selected for large- 
scale recombinant protein production, purification and crystallization. Gene 9 was 
PCR-amplified from the genomic DNA of bacteriophage 29 using the following 
primers: 5’‘-CCCATGGCATATGTACCATTATCAGGAACG-3’ and 5’-CCGCTC 
GAGTCAGTGGTGGTGGTGGTGGTGCCTCAATTCATTCTCGACGC-3’. The 
genomic DNA of bacteriophage 29 was prepared by heating the mature 629 
phage (5 x 10” particles per ml) at 96°C for 10 min. One microlitre of the heat- 
treated phage solution was then used as a template for each PCR reaction. The 
purified PCR products were digested using the endonucleases Ncol and XhoI 
and then cloned into pET28b (Novagen) using the NcoI-Xhol sites and placing a 
His, tag on the C terminus of the recombinant protein. The recombinant 9 gene 
was PCR-amplified from the pET28b-gp9 plasmid using the following primers: 
5!-AAGGAGGAAGCAGGTATGGCATATGTACCATTATCAGGAACG-3’ (forward) 
and 5'-CTGTGCGTGCTCCATCAGTGGTGGTGGTGGTGGTGCCTCAATTC 
ATTCTCGACGC-3’ (reverse). The PCR products were cloned into pDG (Bacillus 
centre: http://bgsc.org) using a ligation-free method”. The gp9 loop deletion 
mutants were generated using two-step overlapping PCR with the primers listed 
in Extended Data Table 1. The PCR products were digested with NcoI and Xhol 
and then cloned into pET28b. 

The pDG-gp9 plasmids were transformed into Bacillus subtilis cells using the 
natural competence of the cells that developed at the end of the logarithmic growth 
phase under semi-starvation conditions (see http://wiki.biol-uw.edu.pl/t/img_auth. 
php/b/b5/Bacillus_subtilis_competent_cells.pdf for the protocol details)’. 
Colonies containing the pDG-gp9 plasmids were selected and cultivated at 37°C 
until they reached an OD600 value of ~1. Then, recombinant full-length gp9 was 
induced for expression with 1 mM isopropyl B-p-1-thiogalactopyranoside (IPTG). 
The recombinant gp9 mutants were expressed in E. coli BL21(DE3) cells at 16°C 
following the standard protocol for IPTG-induced protein expression in E. coli 
cells (see the Novagen pET system handbook). The produced recombinant pro- 
tein was affinity purified using cobalt-charged BD TALON resins and eluted from 
the cobalt beads using an elution buffer containing 100 mM sodium phosphate at 
pH 8.0, 300 mM sodium chloride and 200 mM imidazole. The eluted protein was 
concentrated and further purified with a Superdex 200 column (GE) running ina 
buffer containing 20 mM Tris pH 8.0, 300 mM sodium chloride and 1mM DTT. 
The peak fractions were collected and concentrated to ~10 mgml"! for crystal 
screening. Size-exclusion chromatography purification of full-length gp9 indicated 
that this protein exists mainly as a monomer, whereas the gp9A417-491 mutant 
exists mainly as a multimer, probably a hexamer, in solution. 

Crystallization. All crystals were obtained by hanging-drop vapour diffusion at 
20°C using 21] protein (10 mg ml!) mixed with an equal volume of well solution. 
Crystals of full-length gp9 were grown in 0.1 M sodium acetate pH 4.6 and 
2.0M ammonium sulphate. It took approximately half a year for the crystals to 
grow. The crystals were soaked for 30s in the well solution containing a final 
concentration of 20% (v/v) glycerol to flash freeze in liquid N>. The gp9A417-491 
crystals were grown under two different conditions. Plate-shaped crystals were 
grown in 0.94% (v/v) ethanol and 10% PEG-400 in a HEPES buffer pH 7.5 and 
2M magnesium chloride. Diamond-shaped crystals were grown in 1.2 M lithium 
sulphate and 10% (v/v) PEG-400 in a sodium acetate buffer pH 4.4. The gp9A 
417-491 crystals were soaked for 30s in the well solution containing 10% (v/v) 
glycerol to flash freeze in liquid N2. The mercury derivative of gp9A417-491 
was prepared by soaking the diamond-shaped crystals in the cryo-well solution 
containing 10 mM K>Hgl, for 2h. 

Phage production, cryo-EM data collection and image processing. Fibred $29 
particles were produced in Bacillus subtilis su44* cells infected with the mutants 
sus16(300)-sus14(1241) and purified by centrifugation in an isopycnic 65% (w/v) 
CsCl gradient. The purified phage particles (5 x 10” particles per ml) were in 
a buffer containing 50 mM Tris-HCl pH 7.8, 100mM NaCl and 10mM MgCl. 
Liposomes containing phospholipids similar to those in the B. subtilis cell mem- 
brane were prepared from a chloroform solution consisting of phosphatidyl- 
ethanolamine (PE), phosphatidyl-p.-glycerol (PG), phosphatidyl-choline (PC) and 
cholesterol (CL) at a molar ratio of 5:6:2:9. Liposomes containing phospholipids 
similar to those in the eukaryotic cell membrane were prepared from a chloroform 
solution consisting of PC, PE, CL and sphingomyelin (SPH) at a molar ratio of 


1:1:3:1. The phospholipid chloroform solution was slowly blow-dried using N> 
to form films consisting of stacked phospholipid bilayers. The films were further 
dried in a vacuum desiccator for 24h and then re-suspended in a buffer containing 
25mM HEPES pH 7.4, 100 mM KCl, 10% glycerol, and 1 mM DTT. The phospho- 
lipid bilayer emulsion was frozen and thawed several times and then sonicated for 
approximately 120s until the solution cleared. The sonicated solution was passed 
through a 0.1-\.m filter 10 times to form liposomes with a uniform size of approxi- 
mately 1,000 A. The in vitro ejection of the 629 genome was triggered by adjusting 
the buffer pH to ~4.2 with a low-pH buffer containing 0.1 M sodium acetate pH 4.0 
and 300mM ammonium sulphate in the presence of either 0.8% (v/v) Triton X-100 
(phage, low-pH buffer and 10% (v/v) Triton X-100 at a volume ratio of 1:10:1) or 
5mM liposomes (phage, low-pH buffer and liposomes at a volume ratio of 1:10:10), 
followed by incubation at 37 °C for 20h. The detergent-stabilized samples were 
gradually transferred to a buffer containing 10 mgml”! amphipol A8-35 (Anatrace, 
A835) on an affinity grid and flash frozen in liquid ethane at 100K. The affinity 
grids were prepared based on a recently published protocol”. The activated carbon 
film grids were covalently bonded to the anti-29 gp9 polyclonal antibody. The 
low-pH-treated 29-liposome complex was directly flash frozen in liquid ethane 
at 100 K using normal 400-mesh holey carbon grids (Quantifoil, 1.2 jum x 1.3m). 
To completely remove the genomic DNA, the low-pH-treated particles were further 
purified in an isopycnic 65% (w/v) CsCl gradient with 0.02% (w/v) Triton X-100. 
The CsCl gradient purified sample was directly frozen in liquid ethane at 100K 
using 400-mesh thin carbon coated holey carbon grids (Lantuo Jiangsu, China, 
2m x 21m). A total of 483 cryo-EM images of the phage in amphipol A8-35 
were recorded on a K2 summit detector at a nominal magnification of 22,500 
(which yields a calibrated pixel size of 1.32 A) using an FEI Titan Krois trans- 
mission electron microscope operated at 300 kV. A total of 150 cryo-EM images 
of the phage-liposome complex were recorded on a CCD camera at a nominal 
magnification of 29,000 (which yields a calibrated pixel size of 3.02 A) using an FEI 
F20 transmission electron microscope operated at 200kV. A total of 945 images of 
the low-pH-treated and CsCl-gradient-purified sample were recorded on a Falcon 
II direct electron detector at a nominal magnification of 58,000 (which yields a 
calibrated pixel size of 1.97 A) using an FEI Tecnai Arctica transmission electron 
microscope operated at 200kV. Individual phage particle images were selected 
and boxed with the program EMAN2 (http://blake.bcm.edu/emanwiki/EMAN2). 
The contrast transfer function (CTF) parameters were determined with the ‘ctfit’ 
routine in the EMAN package”. Only the phases were corrected for the observed 
image data using the determined CTF parameters. The initial orientations and 
centres of the tails were determined by phage head reconstructions with the pro- 
gram EMAN assuming five-fold symmetry with the symmetry axis correspond- 
ing to the z-axis. The tail orientations were determined by searching the rotation 
around the z-axis and assuming six-fold symmetry for the reconstructions. The 
particle numbers used in the final calculation of the low-pH-treated, low-pH- 
treated and CsCl-gradient-purified, and low-pH-treated with liposomes samples 
were 3,420, 24,544 and 802, respectively. The resolutions of the final reconstructed 
detergent-stabilized tail, detergent-stabilized and CsCl-gradient-purified tail, and 
the tail-liposome complex maps were estimated to be ~15.4 A, 10.1 A and 34.5 A, 
respectively, using the gold standard”®. 

Fluorescent dye release. Liposomes containing calcein (Sigma, C0875) were pre- 
pared using the same phospholipids (PE, PG, PC and CL at a molar ratio of 5:6:2:9) 
and a procedure similar to that described for the cryo-EM sample preparation but 
with a re-suspension buffer containing 10 mM sodium phosphate pH 7.4, 30 mM 
NaCl (0.1 x PBS) and 120 mM calcein. Free calcein that was not encapsulated into 
the liposomes was removed using a 5-ml HiTrap desalting column (GE) running 
in a buffer containing 10 mM sodium phosphate pH 7.4 and 30mM NaCl. Twenty 
microlitres of calcein-containing liposomes was mixed with an equal volume of the 
low-pH buffer containing 0.1 M sodium acetate pH 4.0 and 300mM ammonium 
sulphate. After adding 21] of phage solution (at a concentration of 5 x 10" particles 
per ml) into liposomes pre-incubated at 37 °C, 211 of the mixture was collected 
every 5 min. Each aliquot was immediately diluted into 1 ml of 0.1 x PBS buffer 
and the fluorescence signal of the dilution was measured using a Nanodrop 3300 
spectrofluorometer with the excitation and emission wavelengths set to 470 nm 
and 520nm, respectively. As a control, the calcein-containing liposomes mixed 
with the phage and a neutral-pH buffer containing 0.1 M sodium phosphate pH 7.4 
and 300 mM ammonium sulphate was measured using the same procedure. To 
determine the background fluorescence, the calcein-containing liposomes mixed 
with the low-pH buffer along were also measured using the same procedure. All 
measurements were repeated at least three times. The data used for the gener- 
ation of Extended Data Fig. 5 were the mean values of the three independent 
measurements. 

Modelling of the post-ejection conformation. Modelling of the post-ejection 
conformation of the tail knob was performed using the program RosettaCM”! and 
the MDFF package””. Only the L loops were left free to move during the modelling, 
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whereas the rest of the structure and the interactions between the monomers were 
maintained with harmonic restrains. 

X-ray data collection, processing, structure determination, refinement and 
analysis. X-ray diffraction data were collected using synchrotron radiation 
at the Shanghai Radiation Facility beamline BL17U (Extended Data Table 2). 
The plate-shaped crystals of gp9A417-491 diffracted to ~2.0 A and belong to 
space group 1222, with three molecules in the asymmetric unit and cell param- 
eters of a=94.6 A, b= 135.2 A and c=313.4A. The diamond-shaped crystals of 
gp9A417-491 diffracted to ~2.6 A and belong to space group P23, with two 
molecules in the asymmetric unit and cell parameters of a=b=c= 184.7 A. Crystals 
of full-length gp9 diffracted to ~3.5 A and belong to space group P2,3, with two 
molecules in the asymmetric unit and cell parameters of a= b=c=183.4A. The 
data were integrated and scaled with the HKL2000 suite (Extended Data Table 2)"8. 

The structure of go9A417-491 was determined by SAD using the diamond- 
shaped crystals and the anomalous signals of mercury atoms as measured at a 
wavelength of 0.97923 A. Heavy atom sites were located using the program 
SHELX”’. The heavy atom parameters were refined and the initial phases were 
calculated using reflections in the resolution region of 50 A to 2.6 A with the pro- 
gram SHARP”. The calculated phases were gradually improved and extended to 
a higher resolution using solvent density flattening with the program DM*!. The 
resulting electron density maps are of good quality, in which most of the residues 
can be clearly recognized. The structures of full-length gp9 and gp9A417-491 in 
the plate-shaped crystals were determined by molecular replacement”. The full- 
length gp9 structure was refined to a Ryork/Rfree of 0.201/0.229. Of all the residues, 
93% are in the most favoured regions of the Ramachandran plot and 1.1% (10 of 
1140) of the residues are in the disallowed regions. The gp9A417-491 structure 
in the diamond-shaped crystals was refined to a Rwork/Rfree of 0.166/0.201. Of all 
the residues, 95% are in the most favoured regions of the Ramachandran plot and 
0.3% (3 of 1021) of the residues are in the disallowed regions. The gp9A417-491 
structure in the plate-shaped crystals was refined to a Rwork/Rfree of 0.153/0.183. Of 
all the residues, 97% are in the most favoured regions of the Ramachandran plot 
and no residue is in the disallowed regions. The densities of residues 437-457 of 
full-length gp9 are not visible in the final 3.5 A electron density map. 

The program COOT was used for model building and for making adjustments®. 
The program PHENIX was used to refine the structure (Extended Data Table 2)**. 
The program SITUS"® was used to fit the structure into the six-fold-averaged 
cryo-EM density of the tail knob in the reconstruction of the 29 mature virus!°. 
Some of the figures were prepared with the programs Chimera® and ESPript*®. 
Cryo-EM maps used in Fig. 4 are DNA-filled mature (low-pass filtered to a reso- 
lution of 1/12 A~! and ata contouring level of 4.00), sodium-perchlorate-treated 
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and CsCl-gradient-purified (low-pass filtered to a resolution of 1/25 A~! and ata 
contouring level of 2.00), low-pH-treated and CsCl-gradient-purified (low-pass fil- 
tered to a resolution of 1/10 A~! and ata contouring level of 3.90), low-pH-treated 
(low-pass filtered to a resolution of 1/15 A~! and ata contouring level of 2.1c) and 
low-pH-treated with liposomes (low-pass filtered to a resolution of 1/26 A~! and 
at a contouring level of 3.0c) tails. 
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Extended Data Figure 1 | Mass spectrometry and sequence analysis online tool ProDOS*”. The threshold (red line) was set based on a false 
showing the disordered region of gp9. a, Western blotting and mass positive rate of 0.15. Residues with a disorder probability of more than 0.38 
spectrometry analysis of the recombinant full-length and degraded are considered disordered. The start and end positions of several predicted 
gp9 proteins. Positions of the full-length and degraded gp9 proteins are long disordered regions are indicated with red arrows. c, Diagrams 
indicated with arrows and labelled with P5-P1 in the western blotting showing the location of the predicted disordered region in the sequence 
image. Corresponding peaks in the mass spectra are labelled in the same and the fragments determined by the mass spectrometry analysis. 
order as in the western blot. b, Disordered region analysis of gp9 using the 
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Extended Data Figure 2 | Surface charge distribution of gp9. a, Diagram 
showing the surface charge distribution of gp9A417-491. Negative and 
positive electrostatic potentials are coloured red and blue, respectively. 
The monomers within the hexamer interact with each other primarily 
through hydrogen bonds and hydrophobic interactions. The outer 
cylindrical surface of the tube is hydrophilic. The top and bottom surfaces 
of the tube have substantially different features. The top surface on the 
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L-loop surface 


top surface 


N-B domain 


50A 


bottom surface 


N-6-barrel domain side is largely hydrophobic, whereas the bottom 
surface on the tip 8-domain side is rich in negatively charged residues. 

b, Diagram showing the surface charge distribution of the interior L loops. 
Negative and positive electrostatic potentials are coloured as in a. A thin 
central slice of the tube is shown and coloured dark grey to indicate the 
tube boundary. 
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Extended Data Figure 3 | Fitted gp9 structure in a cryo-EM map of the 
tail. Diagram showing the gp9 hexamer structure fitted in a cryo-EM 
map of the tail contoured at 4.00. An ~45-A-thick cross-section is shown. 
The gp9 hexamer structure is shown as a ribbon representation with 

the L loops coloured red. The density map is shown as a solid surface 
representation and coloured semi-transparent grey. 
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Extended Data Figure 4 | Electron microscopy images showing the low-pH-treated, DNA-emptied 029 particles aggregating around a 


mature }29-liposome complex. a, Cryo-EM images of mature 29 liposome that contains lipids similar to those in the eukaryotic cell 
particles with B. subtilis cell membrane-like liposomes at neutral pH. membrane. Scale bars, 200 nm. 


b, Negatively stained electron microscopy images showing the 
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Extended Data Figure 5 | Release of calcein from liposomes induced 

by $29. Time-course release of calcein from liposomes induced by 629 

in a low- or neutral-pH buffer is shown in solid lines with blue (low pH) 
or red (neutral pH) triangles representing the data points. Percentages of 
the calcein released from the liposomes were calculated from (Fa — Fo)/ 
(Fax — Fo) where Fy is the measured fluorescence value of the low or 
neutral pH sample, Fp is the background fluorescence and Fmax is the 
maximum fluorescence value measured after adding Triton X-100 at a final 
concentration of 0.02% (v/v). Fo (the x axis) is generated from a linear least 
square fitting of the fluorescence values measured on a liposome-low-pH 
buffer mixture without adding the phage. The data used for the linear least 
square fitting of the background fluorescence are represented by green 
triangles. The data are expressed as the mean +s.d. of three independent 
measurements. 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


Pre-DNA ejection Post-DNA ejection 


DNA release 


—? 


Extended Data Figure 6 | Structure models of the pre- and post-DNA- of the pre- and post-DNA-ejection L loop structures. Left, pre- 

ejection L loops. a, Post-DNA-ejection L loop structure modelled with DNA-ejection gp9 structure shown in a ribbon representation with the L 
RosettaCM using the electron microscopy map as a restraint and fitted loops coloured red. Right, post-DNA-ejection gp9 structure shown in a 
into the post-DNA-ejection tail electron microscopy map. The gp9 ribbon representation with the modelled structure of the flexible region 
structure is shown in a ribbon representation. The L loops are coloured coloured cyan and the rest of the L loop coloured red. 


red. The rest of the structure is coloured grey. b, Structural comparison 
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a ] 10 MY 30 ae 
Bacteriophage $29 gp9 417 L{O]GNKNSLENOKS|STLUFINGIMGMI|GG....GISAGAS.......... AAGGSA[LIGMAS[S|V 
Bacteriophage Nf tail protein 416 LiolcINKNSLENQKDISTLIFINIGVMGMILIGN....GIGAAAS.......... AATGSAI\V\GVAS|SIA 
Staphylococcus phage GRCS tail protein 420 Q\S|QOKIANRQRNAES(QLIT|S\RIIDNVILING.....-------- es SDPKSRFYDAVS|V/ASNL|S|P 
Enterococcus phage EF62phi tail protein 397 KA|SINAYSRQLAED|RLPISINRIIINS T\V|KDII TSGNLLGNMQSGNLOGTQGSFYNAVNILILSD LK|P 
Staphylococcus phage PSa3 tail protein 396 G\Q)SQ|IOANRQKNAE|SQLIT|TINIRIIDNVILNG.....-....----- SDPKSRFYDAVSN\ASNLIS|P 
50 
Bacteriophage $29 gp9 TcMTSTAGNAV 473 
Bacteriophage Nf tail protein TIGMVSSAGNAV 472 
Staphylococcus phage GRCS tail protein TIAILIF GKFNEEY 486 
Enterococcus phage EF62phi tail protein QO\SVILSKFENEY 452 
Staphylococcus phage PSa3 tail protein TIAILIF GKFNEEY 452 
b 1 10 20 30 
HIV gp41 4 VaVG. -IGAUFILEF LSAAGS). . TMGAIASMT[TAOAROLLSG 36 
$29gp9 429 j........ S SIT LIFIN(I Mf 1/Gc}. . Grslalcalsjalalclesalic.... 465 
influenza virus A HA 1 IGQCLFGALAGF I|EIN(QWEleM I|DIGWYGFRIHIQNISIEIGTIGQAAID.... 38 
4gp5 61 j.......ITSIAAMS[eI Ggsvitic. .. . PIVJEGITRIVYIGHFLDK... 88 
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Extended Data Figure 7 | Sequence alignments of the gp9 L-loop-like 
peptides in phage tail proteins. a, Sequence alignment of the gp9 L-loop 
and L-loop-like peptides from short non-contractile tails. Conserved 
residues are boxed and coloured red. The core hydrophobic region of the 
L loop is indicated by a dashed box. b, Sequence alignment of gp9 L loop, 
HIV fusion peptide, influenza fusion peptide and a potential hydrophobic 
membrane active peptide of the bacteriophage T4 tail protein gp5. 
Conserved residues are boxed and coloured red. Completely conserved 
residues are shown in white on a red background. The core region of the 


HIV fusion peptide is indicated by a dashed box. c, Schematic diagram 
showing a possible mechanism for the exposure of the potential T4 
hydrophobic membrane active peptide during infection. The structure of 
the T4 gp27 (top 6-barrels) and gp5 (bottom) complex is shown in ribbon 
representation. The potential hydrophobic peptide of gp5 is coloured 

red. The rest of the complex structure is coloured grey. The potential 
hydrophobic peptide is exposed after the release of the gp5 C-terminal 
needle. Conformational changes of the lysozyme domain trigger the 
insertion of the hydrophobic peptide into the membrane. 
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Extended Data Table 1 | Primers used for the loop deletion mutants of gp9 


Forward primer | 5'-CCCATGGCATATGTACCATTATCAGGAACG-3' 


Reverse primer | 5'-CCGCTCGAGTCAGTGGTGGTGGTGGTGGTGCCTCAATTCATTCTCGACGC-3’ | 
gp9A412-491 Forward: 5'-TAGCAATACTAAAT GACCAGTTAACGAAAAT GGG-3' 
Reverse: 5'-CCCATTTTCGTTAACTGGTCATTTAGTATTGCTA-3' 
Forward: 5'-TAGCAATACTAAAT GACGCAAACAT TCC GCCGCAG-3' 
Reverse: 5'-CTGCGGCGGAATGTTTGCGTCATTTAGTATTGCTA-3' 
Forward: 5'-TAGCAATACTAAAT GACAAGCAAGCCGATATAGC-3' 
Reverse: 5'-GCTATATCGGCTTGCTTGTCATTTAGTATTIGCTA-3' 
Forward: 5'-CTATCTATCTGCTTATCAGT TAACGAAAATGG -3' 
Reverse: 5'-CCATTTTCGT TAACTGATAAGCAGATAGATAG-3' 
Forward: 5'-GACTATCTATCTGCT TATGCAAACAT TCCGCCGCA-3' 
Reverse: 5'-TGCGGCGGAATGTTTGCATAAGCAGATAGATAGTC-3' 
Forward: 5'-CTATCTATCTGCTTATAAGCAAGCCGATATAG-3' 
Reverse: 5'-CTATATCGGCTTGCTTATAAGCAGATAGATAG-3' | 
Forward: 5'-ATTTACAGGGCAACAAACAGTTAACGAAAATGGG-3' | 
Reverse: 5'-CCCATTTTCGTTAACTGTTTGT TGCCCTGTAAAT-3' 
Forward: 5'-ATTTACAGGGCAACAAAGCAAACATTCCGCCGCA-3' 
Reverse: 5'-CTGCGGCGGAATGTTTGCTTTGTTGCCCTGTAAAT-3' 
Forward: 5'-ATTTACAGGGCAACAAAAAGCAAGCCGATATAGC-3' 
Reverse: 5'-GCTATATCGGCTTGCTTTTTGTTGCCCTGTAAAT-3' 


Common forward and reverse primers for the overlap PCRs are listed at the top two rows of the table. Specific overlap PCR primers for each mutant are listed after the name of each deletion mutant. 


gp9A412-486 


gp9A412-481 


gp9A417-491 


gp9A417-486 


gp9A417-481 


gp9A422-491 


gp9A422-486 


gp9A422-481 
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Extended Data Table 2 | Crystal data collection and refinement statistics 


Data collection 
Space group 
Cell dimensions 


a, b, c (A) 

a, By (°) 
Wavelength (A) 
Resolution (A) 
Reym OF Rmerge” 
lol” 
Completeness (%)* 
Redundancy™ 
Refinement 
Resolution (A) 
No. Reflections 
Ruork / Riree (%) 


No. Atoms 


Protein 
Ligand/lon 
Water 


B-factors 


Protein 
Ligand/lon 
Water 


R.m.s deviations 
Bond lengths 
Bond angles (°) 


gp9A417-491 


P2,3 


184.67, 184.67, 184.67 


90, 90, 90 


0.97884 


49.35 - 2.60 


0.123 (0.527) 


16.49 (7.40) 


99.99 (100.00) 


11.4 (11.3) 


49.35 - 2.60 


64243 


16.64/20.07 


40.5 


51.8 


0.009 
1.18 


gp9A417-491 


Hg-SAD 


P2,3 


184.25, 184.25, 184.25 


90, 90, 90 


0.97923 


50.00 - 3.50 


0.200 (0.591) 


16.43 (5.22) 


100.00 (100.00) 


11.5 (11.7) 


50.00 - 3.50 


26595 


gp9A417-491 


1222 


94.6, 135.16, 313.43 


90, 90, 90 


0.97884 


43.32 - 2.04 


0.105 (0.281) 


17.66 (9.20) 


99.29 (95.11) 


6.1 (5.9) 


43.32 - 2.04 


126814 


15.31/18.32 


12504 


21 


1312 


27.3 


36.7 


48.9 


0.008 
1.05 


gp9 full-length 


P2,3 


183.45,183.45,183.45 


90, 90, 90 


0.98 


45.86 - 3.50 


0.135 (0.456) 


18.36 (5.54) 


100.00 (100.00) 


5.4 (5.6) 


45.86 - 3.50 


26215 
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AMPK-SKP2-CARMI1 signalling cascade in 
transcriptional regulation of autophagy 


Hi-Jai R. Shin!*, Hyunkyung Kim!*, Sungryong Oh!, Jun-GiLee!, Minjung Kee!, Hyun-Jeong Ko?, Mi-Na Kweon?, 


Kyoung-Jae Won‘ & Sung Hee Baek! 


Autophagy is a highly conserved self-digestion process, which is 
essential for maintaining homeostasis and viability in response to 
nutrient starvation!*. Although the components of autophagy in 
the cytoplasm have been well studied™®, the molecular basis for the 
transcriptional and epigenetic regulation of autophagy is poorly 
understood. Here we identify co-activator-associated arginine 
methyltransferase 1 (CARM1) asa crucial component of autophagy 
in mammals. Notably, CARM1 stability is regulated by the SKP2- 
containing SCF (SKP1-cullin1-F-box protein) E3 ubiquitin ligase 
in the nucleus, but not in the cytoplasm, under nutrient-rich 
conditions. Furthermore, we show that nutrient starvation results in 
AMP-activated protein kinase (AMPK)-dependent phosphorylation 
of FOXO3a in the nucleus, which in turn transcriptionally represses 
SKP2. This repression leads to increased levels of CARM1 protein 
and subsequent increases in histone H3 Arg17 dimethylation. 
Genome-wide analyses reveal that CARMI exerts transcriptional 
co-activator function on autophagy-related and lysosomal genes 
through transcription factor EB (TFEB). Our findings demonstrate 
that CARM1-dependent histone arginine methylation is a crucial 
nuclear event in autophagy, and identify a new signalling axis of 
AMPK-SKP2-CARM1 in the regulation of autophagy induction 
after nutrient starvation. 

To explore the importance of nuclear events in autophagy, we 
proposed that specific histone marks are involved in the epigenetic 
and transcriptional regulation of autophagy in the nucleus leading to 
the fine-tuning of the autophagy process. We induced autophagy in 
mouse embryonic fibroblasts (MEFs) by glucose starvation, and sought 
to identify altered specific histone marks. We observed an increase 
in histone H3 Arg17 dimethylation (H3R17me2) levels in response 
to glucose starvation (Fig. 1a), which also occurred when autophagy 
was triggered by amino acid starvation or rapamycin (Extended Data 
Fig. 1a). Notably, nutrient starvation resulted in increased levels of 
CARMI protein (Fig. 1b and Extended Data Fig. 1b). 

To examine whether CARM1 induction and subsequent increases 
in H3R17me2 are related to autophagy occurrence, we analysed the 
conversion of non-lipidated LC3-I to lipidated LC3-II, as a common 
marker of autophagic activity’. The increase in CARM1 was associ- 
ated with an increase in LC3-II (Fig. 1c and Extended Data Fig. Ic, d). 
To confirm that the decrease in LC3-II reflects decreases in functional 
autophagic degradation, autophagic flux was also analysed using the 
levels of p62 (also known as SQSTM1)®”. Glucose starvation induced 
p62 degradation and LC3-II accumulation in wild-type MEFs but not 
in Carm1 knockout and knock-in MEFs expressing the enzymatic 
activity-deficient mutant (Fig. 1c). 

To evaluate the role of CARM1 in the autophagic process, the for- 
mation of green fluorescent protein (GFP)-tagged LC3-positive auto- 
phagosome was examined. The increase in GFP-LC3 punctate cells was 


notably attenuated in Carm1 knockout compared to wild-type MEFs 
(Fig. 1d and Extended Data Fig. le). Transmission electron micros- 
copy (TEM) further showed an increase in the number of autophagic 
vesicles in wild-type MEFs, but not in Carm1 knockout and knock-in 
MEFs (Fig. le). We performed LC3 flux analysis using bafilomycin 
A1, an inhibitor of the late phase of autophagy. Defects in autophagic 
flux caused by the loss of CARM1 were confirmed by immunoblot 
analysis (Extended Data Fig. 2a, b) and imaging experiments using 
mCherry-GFP-LC3, which provides a simultaneous readout of 
autophagosome formation and maturation (Extended Data Fig. 2c). 
In addition, ellagic acid, a naturally occurring polyphenol reported 
to selectively inhibit H3R17me2 (ref. 10), greatly compromised the 
autophagic process (Fig. 1f and Extended Data Fig. 2d-f). 

Next, we examined how CARM1 induction is regulated after glucose 
starvation. We found that CARM1 protein levels were increased only in 
the nucleus after glucose starvation (Fig. 2a, left). Treatment of MG132, 
a 26S proteasome inhibitor, inhibited nuclear CARM1 degradation 
(Fig. 2a, right). Glucose starvation markedly reduced the ubiquitina- 
tion of CARM1 in the nucleus, whereas CARM1(K471R) failed to be 
ubiquitinated, indicating that K471 is the ubiquitination-targeting site 
(Fig. 2b and Extended Data Fig. 3a). We then sought to identify the E3 
ubiquitin ligase responsible for CARM1 ubiquitination. Notably, SKP2, 
an F-box protein of the SCF E3 ubiquitin ligase complex, was identified 
as a CARM1-binding protein along with cullin 1 (CUL1) (Fig. 2c and 
Supplementary Table 1). CARM1 exhibited specific binding to SKP2 
(Fig. 2d) and CUL1 (Extended Data Fig. 3b). 

Since CARM1 is stabilized after glucose starvation and possi- 
bly ubiquitinated by the SKP2-containing E3 ligase complex under 
nutrient-rich condition, we checked for changes in SKP2 protein 
levels. A reduction in SKP2 and an increase in CARM1 protein levels 
were observed in glucose-starved cells (Fig. 2e). Decreased levels of 
SKP2 resulted in the stabilization of other known SKP2-SCF substrates 
(Extended Data Fig. 3c). Furthermore, SKP2 knockdown attenuated 
CARMI ubiquitination in the nucleus (Fig. 2f) and markedly increased 
the half-life of CARM1 (Extended Data Fig. 3d). By contrast, over- 
expression of wild-type SKP2, but not the SKP2AF mutant that is not 
able to form a SKP2-SCF complex"!, decreased the half-life and protein 
levels of CARM1 in cells deprived of glucose (Fig. 2g, h and Extended 
Data Fig. 3e). We speculate that exclusive nuclear localization of SKP2 
results in selective CARM1 ubiquitination in the nucleus. As a result of 
SKP2 downregulation, the interaction between CUL1 and CARM1 sig- 
nificantly decreased after glucose starvation (Extended Data Fig. 4a). 
Also, as a component of the SCF complex, CUL1 regulated CARM1 
protein levels (Extended Data Fig. 4b-e). Collectively, these data indi- 
cate that the SKP2-containing SCF E3 ligase complex is responsible for 
CARM1 degradation in the nucleus under nutrient-rich conditions 
(Fig. 2i). 
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Figure 1 | Increased H3R17 dimethylation by CARM1 is critical 
for proper autophagy. a, b, Immunoblot analysis of various histone 
marks and CARM1 in response to glucose starvation (Glc starv.). 

c, Wild-type (WT), Carm1 knockout (KO) or knock-in (KI) MEFs were 
subject to immunoblot analysis. The LC3-II/LC3-I ratio is indicated. 

d, Representative confocal images of GFP-LC3 puncta formation. 
Graph shows quantification of LC3-positive punctate cells (right). 


It has been shown that AMPK is activated during glucose starvation 
and leads to starvation-induced autophagy'*""“. As the role of nuclear 
AMPK in autophagy outcome has not been defined thus far, we aimed 
to examine whether AMPK is involved in the transcriptional regulation 
of autophagy. We found that AMPKa2 and phosphorylated AMPK, 
the activated form of AMPK, increased in the nucleus after glucose 
starvation (Fig. 3a). Increased AMPKa2 resulted from transcription 
induction rather than post-translational regulation (Extended Data 
Fig. 5a-c). AMPKa2 has been shown to be preferentially expressed 
in the nucleus", suggesting that it might perform distinct roles in the 
nucleus. AMPK failed to directly bind or phosphorylate CARM1 and 
SKP2 (Extended Data Fig. 5d, e). However, AMPK activation by ami- 
noimidazole carboxamide ribonucleotide (AICAR) and phenformin 
resulted in the increase of CARM1 and reduction of SKP2 (Extended 
data Fig. 5f), and this was compromised when AMPK activity was 
blocked by compound C (Extended data Fig. 5g). 

We then used wild-type and Ampkal and Ampka2 (also known 
as Prkaal and Prkaa2) double knockout (DKO) MEFs to check for 
the expression of CARM1 and SKP2. In the nucleus, CARM1 induc- 
tion and SKP2 reduction after glucose starvation were abrogated in 
Ampk DKO MEFs (Fig. 3b).The half-life of CARM1 in the nucleus was 
decreased in Ampk DKO MEFs (Extended Data Fig. 5h). Introduction 
of wild-type AMPKa2, but not the dominant-negative form, in Ampk 
DKO MEFs resulted in a recovered expression pattern of SKP2 and 
CARMI, similar to wild-type MEFs (Extended Data Fig. 5i). SKP2 
depletion in Ampk DKO MEFs led to increased CARMI protein 
levels, indicating that the reduction of CARM1 in Ampk DKO MEFs is 
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Nuclei counterstained with DAPI. Scale bar, 101m. e, Representative TEM 
images. Scale bar, 21m. High magnification of boxed areas is shown on 
the right. Scale bar, 0.5 1m. Autophagosomes (blue arrows), autolysosomes 
(red arrows) and multilamellar body (yellow arrow). f, Representative 
confocal images of GFP-LC3 puncta formation. Ellagic acid (100 1M). 
Scale bar, 10,.m. Bars, mean +s.e.m.; n= 5, with over 100 cells; **P<0.01 
(one-tailed t-test) (d, f). 


mediated by SKP2 (Extended Data Fig. 5j). Furthermore, since binding 
of CARM1 to CUL] is mediated by SKP2, the CARM1-CULI] inter- 
action was maintained upon glucose starvation in Ampk DKO MEFs 
(Extended Data Fig. 5k). 

Reduction of SKP2 expression after glucose starvation is not medi- 
ated by proteasomal degradation (Extended Data Fig. 51), but instead 
regulated at the transcription level (Fig. 3c). Glucose starvation failed 
to decrease Skp2 mRNA levels in Ampk DKO MEFs, but reconstitution 
of wild-type AMPKo2 restored the reduction in Skp2 mRNA (Fig. 3d). 
Therefore, we were prompted to search for a possible regulatory mecha- 
nism of SKP2 downregulation by AMPKaz2. Recent studies have empha- 
sized the AMPK-FOXO axis as a highly conserved nutrient-sensing 
pathway crucial for cellular and organismal homeostasis'*. AMPK 
directly phosphorylates FOXO3a and regulates FOXO3a transcriptional 
activity!”. Although mainly known as a transcriptional activator, 
FOXO also functions as a transcriptional repressor'*7!. Skp2 promoter 
analysis revealed a highly conserved FOXO response element (FRE) 
(Fig. 3e). We proposed that FOXO might function as a transcriptional 
repressor of SKP2 and performed luciferase reporter assay driven by 
the Skp2 promoter. Glucose starvation attenuated Skp2 promoter lucif- 
erase activity, but not the Skp2 promoter containing an FRE muta- 
tion (Fig. 3e). Skp2 mRNA levels failed to decrease in Foxo1/3/4 triple 
knockout (TKO) MEFs (Extended Data Fig. 5m), indicating that FOXO 
is a crucial transcription factor in the repression of SKP2. 

Glucose starvation resulted in AMPK-dependent FOXO3a phos- 
phorylation (Extended Data Fig. 5n). In addition, AMPKa2 and 
phosphorylated FOXO3a were co-recruited to the Skp2 promoter 
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Figure 4 | CARM1 exerts a transcriptional co-activator function on 
autophagy-related and lysosomal genes through TFEB. a, Binding 
between CARM1 and TFEB. b, Representative confocal images. 

Scale bar, 10 jum. c, 2x CLEAR (TFEB RE)-luciferase reporter assays. 

Bars, mean +s.e.m.; n=3. **P<0.01 (one-tailed t-test). d-f, ChIP assays 
on TFEB-dependent, CARM1-dependent (d, e) or CARM1-independent (f) 
promoters after knockdown of TFEB. Bars, mean+s.e.m.; 1 =3. 


upon glucose starvation (Fig. 3f). The recruitment of phosphorylated 
FOXO3a accompanied by a decrease in RNA polymerase II was also 
observed in Ampk DKO MEFs reconstituted with wild-type AMPKa2 
(Extended Data Fig. 50). Notably, reconstitution of wild-type FOXO3a 
in Foxo1/3/4 TKO MEFs significantly reduced the Skp2 mRNA level, 
but neither the FOXO3a(H212R) DNA-binding mutant” nor the 
FOXO3a sextuple SA mutant, which is not phosphorylated by AMPK", 
reduced Skp2 mRNA levels (Fig. 3g). Furthermore, after glucose star- 
vation, phosphorylated FOXO3a, but not the FOXO3a SA mutant, 
was recruited to the Skp2 promoter (Fig. 3h), indicating that AMPK- 
dependent FOXO3a phosphorylation is crucial for the recruitment 
of FOXO3a at the Skp2 promoter. SKP2 expression failed to decrease 
and autophagy occurrence was impaired in FOXO3a SA mutant- 
reconstituted Foxo1/3/4 TKO MEFs (Fig. 3i). 

We observed a marked increase in autophagy occurrence in Ampk 
DKO MEFs after SKP2 knockdown (Fig. 3j, k). We also tested whether 
CARM1 overexpression could restore autophagy in Ampk DKO MEFs. 
Introduction of wild-type or K471R mutant CARM1 restored the 
number of GFP-LC3 punctate cells, whereas enzymatic-dead mutant 
CARM1(R169A) failed to do so (Extended Data Fig. 5p). Collectively, 
we found a signalling axis in autophagy induction in which glucose 
starvation activates AMPKa2 in the nucleus, leading to transcriptional 
repression of Skp2 via FOXO3a phosphorylation. Reduction of SKP2 
expression in turn leads to increased levels of CARM1. 

To gain insight into the role of CARM1 in transcriptional regulation 
of autophagy, we performed RNA-sequencing (RNA-seq) in wild-type 
and Carm1 knockout MEFs after glucose starvation (Extended Data 
Fig. 6a, b). Using a comprehensive list of known autophagy-related 


556 | NATURE | VOL 534 | 23 JUNE 2016 


corn: Seen eSSSSSES 


LC) @Re eee RROren- 


p62 ——-$—— = = - ee 


Tubulin ~-e----- -—---O 


5 * 
4 | 
F Fe | 
2 

i Gd Si a 
) 
wT cl at Atg14 Map1ic3a 


mFed Fasted 
mu Fed+ellagic acid mFasted+ellagic acid 


g, Wild-type and Carm1 knockout MEFs transfected with Flag~TFEB were 
subject to immunoblot analysis. h, Liver tissues from fed or fasted mice 
treated with vehicle or ellagic acid were subjected to immunoblot analysis 
(n=3 per group). i, Expression of autophagy-related genes and lysosomal 
genes in wild-type mouse livers. Bars, mean + s.e.m.; 1 =3 per group. 

*P< 0.05, **P< 0.01 (two-tailed t-test). 
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and lysosomal genes (Supplementary Table 2), we found that poten- 
tial CARM1 target genes (cluster 1) are significantly enriched for 
autophagy-related and lysosomal genes (Extended Data Fig. 6c). 
Transcription factor motif analysis indicated TFEB as a putative major 
transcription factor for CARM1 (Extended Data Fig. 6d). We validated 
CARM1 dependency of the autophagy-related and lysosomal genes by 
quantitative reverse transcription PCR (qRT-PCR) (Extended Data 
Fig. 6e). Furthermore, we performed chromatin immunoprecipitation 
with high-throughput sequencing (ChIP-seq) of H3R17me?2 in wild- 
type MEFs after glucose starvation and observed enriched H3R17me2 
as well as activating H3K4me3 signals at active promoters (Extended 
Data Fig. 6f-h and Supplementary Table 3). 

TFEB functions as a master regulator of lysosomal biogenesis and 
autophagy”**°. After glucose starvation, CARM1 and TFEB exhibited 
mutual binding in the nucleus (Fig. 4a, b and Extended Data Fig. 7a). 
The binding of CARM1 to TFEB was not affected by AMPK (Extended 
Data Fig. 7b). CARMI binds to the transcriptional activation domain 
of TFEB, whereas TFEB binds to the methyltransferase domain of 
CARMI (Extended Data Fig. 7c, d). Although CARM1 also interacts 
with TFE3, TFEB knockdown, but not TFE3 knockdown, markedly 
altered the transcription induction of various target genes (Extended 
Data Fig. 7e-h). 

Introduction of TFEB increased CLEAR-element-containing lucif- 
erase reporter activity and overexpression of CARM1 further enhanced 
its activity (Fig. 4c). To examine whether CARM1-dependent target 
genes are regulated by TFEB, we searched for putative CLEAR motif 
(Supplementary Table 2) and performed ChIP assays. Knockdown of 
TFEB abolished the recruitment of CARM1 to its target promoters, 
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subsequently leading to the failure of H3R17me2 induction (Fig. 4d, e). 
CARMI recruitment was not observed on CARM1-independent pro- 
moters (Fig. 4f). Conversely, a subset of TFEB target genes failed to 
increase upon glucose starvation after CARM1 knockdown (Extended 
Data Fig. 7i). CARM1 depletion was accompanied by a reduction in 
H3R17me2 on TFEB-dependent, CARM1-dependent target promoters, 
with little or no effect on TFEB recruitment (Extended Data Fig. 8a, b). 
Immunoblot analysis confirmed several key autophagy regulators that 
are transcriptionally regulated by CARM1 were induced by glucose 
starvation in wild-type MEFs, but not in Carm1 knockdown or knock- 
out MEFs (Extended Data Fig. 8c). Furthermore, a two-step ChIP assay 
confirmed the recruitment of CARM1 at TFEB-bound genes (Extended 
Data Fig. 8d). 

Previous studies reported that overexpression of TFEB induces auto- 
phagy**. However, introduction of TFEB in Carm1 knockout MEFs 
failed to increase the formation of autophagosomes and levels of LC3-II 
(Fig. 4g and Extended Data Fig. 8e). As CARM1 fails to increase upon 
glucose starvation in Ampk DKO MEFs, TFEB-dependent, CARM1- 
dependent target gene expression and induction of H3R17me2 were 
dampened in Ampk DKO MEFs (Extended Data Fig. 9a, b). However, 
SKP2 knockdown significantly increased the mRNA levels of CARM1 
target genes (Extended Data Fig. 9c), indicating that partial recovery 
of autophagy in Ampk DKO MEFs by SKP2 knockdown is due to 
transcriptional activation of autophagy-related and lysosomal genes. 
Collectively, these data indicate CARM 1 as a crucial co-activator of 
TFEB. 

To examine whether CARM1 and subsequent H3R17me2 are impor- 
tant for autophagy occurrence in vivo, we analysed hepatic autophagy 
in wild-type mice. Livers of fasted mice showed a marked increase in 
CARM1 levels, as well as an increase in LC3 conversion. However, LC3 
conversion was greatly attenuated in fasted livers of mice treated with 
ellagic acid (Fig. 4h). Furthermore, the mRNA expression of various 
CARM1-dependent autophagy-related and lysosomal genes failed to 
increase (Fig. 4i). Ellagic acid treatment inhibited the induction of a 
subset of autophagy-related and lysosomal genes regulated by CARM1, 
and blocked the recruitment of CARM1, but not TFEB, along with 
reduced H3R17me2 levels at CARM1-dependent promoters (Extended 
Data Fig. 9d-f). Given that the inhibition of H3R17me? by ellagic acid 
almost completely blocks CARM1-induced autophagy occurrence, 
ellagic acid might have the potential to be developed as a therapeutic 
agent in autophagy-related diseases. 

Here, we provide a link between energy sensing, chromatin modi- 
fications and transcriptional and epigenetic regulation of autophagy 
(Extended Data Fig. 10). Although our current work is focused on 
CARMI stabilization, we speculate that this type of regulation in the 
nucleus might be an efficient way to regulate target gene expression, 
and could be a prototype of protein stabilization for histone modifiers. 
In addition, our data indicate that when glucose starvation persists and 
transcription of various autophagy-related genes is needed to sustain 
autophagy, AMPK accumulates in the nucleus and actively controls 
transcription. Our findings shed light on the potential therapeutic 
targeting of a new signalling axis of AMPK-SKP2-CARM1 in 
autophagy-related diseases. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

Antibodies and reagents. The following commercially available antibodies 
were used: anti-AMPKal (ab110036), anti-AMPK«a2 (ab3760), anti-ATG14 
(ab173943), anti-FOXO3a (ab12162), anti-histone H3 (ab1791), anti-H3R17me2 
(ab8284), anti-H3K4me3 (ab8580), anti-H3K9me3 (ab8898), anti- H3K36me3 
(ab9050), anti-PI3K class 3 (ab124905), and anti-TFEB (ab2636) antibodies were 
purchased from Abcam. Anti-AMPK (2532), anti-ATG12 (4180), anti-CARM1 
(3379 for immunblotting, 12495 for immunoprecipitation and ChIP), anti-LC3 
(2775), anti-phospho-AMPKa T172 (2535), anti-phospho-FOXO3a $413 (8174), 
anti-SQSTM1/p62 (5114), and anti-TFE3 (14779) antibodies were from Cell 
Signaling Technology. Anti-SKP2 (sc-7164), anti-CUL1 (sc-17775), anti-tubulin 
(sc-8035), and anti-Lamin A/C (sc-6215) were from Santa Cruz Biotechnology. 
Anti-Flag (F3165), anti- ULK1 (A7481) and anti-6-actin (A1978) antibodies were 
from Sigma, anti-HA antibody (MMS-101R) from Covance, and anti-tubulin anti- 
body (LF-PA0146A) from Abfrontier. The following chemicals were used in this 
study: rapamycin (R-5000) was purchased from LC laboratories, cycloheximide 
(C4859), AICAR (A9978) and phenformin (P7045) from Sigma, bafilomycin Al 
(11038) and ellagic acid from Cayman (10569), compound C from Calbiochem 
(171260), and MG132 (M-1157) from A.G. Scientific. 

Cell culture and generation of shRNA knockdown cells. HEK293T, HeLa and 
HepGz? cells, and wild-type, Carm1 knockout, Carm1 knock-in, Ampk DKO and 
Foxo1/3/4"! MEFs were cultured at 37°C in DMEM containing 10% fetal bovine 
serum (FBS) and antibiotics in a humidified incubator with 5% CO). All cell lines 
used in the study were regularly tested for mycoplasma contamination. For glu- 
cose starvation, cells were washed with PBS, then incubated with glucose-free 
DMEM supplemented with 10% dialysed FBS. Transfection was performed with 
Turbofect (Fermentas) or Lipofectamine 3000 (Invitrogen) according to the 
manufacturer's protocol. To generate knockdown cells, lentiviral shRNA con- 
structs were first transfected along with viral packaging plasmids (psPAX2 and 
pMD2.G) into HEK293T cells. Three days after transfection, viral supernatant 
was filtered through 0.45-1m filter and infected into targeting cells. Infected 
cells were then selected with 51g ml~ puromycin. The targeting sequences of 
shRNAs are as follows. mCARM1-1; 5’/-TCAGGGACATGTCTGCTTATT-3’, 
mCARM1-2; 5‘-GCCTGAGCAAGTGGACATTAT-3’, mTFE3-1; 5’-GTG 
GATTACATCCGCAAATTA-3’, mTFE3-2; 5’-TGTGGATTACATCCGCA 
AATT-3’, mTFEB-1; 5’-GC AGGCTGTCATGCATTATAT-3’, mI FEB-2; 5’-CC 
AAGAAGGATCTGGACTTAA-3’, mSKP2; 5’-GCAAGACTTCTGAACTG 
CTAT-3’, hCUL1-1; 5‘-GATTTGATGGATGAGAGTG TA-3’, hCUL1-2; 5’- CC 
CGCAGCAAATAGTTCATGT-3’, hSKP2-1; 5’-TTCCGCTGCCCACGATCA 
TTT-3’, hSKP2-2; 5‘-AGTCGGTGCTATGATATAATA-3’. 

Animal studies. All animal studies and procedures were approved by the 
Institutional Animal Care and Use Committee (IACUC) of Seoul National 
University. Eight-to-ten-week-old male wild-type C57BL/6J mice were injected 
with vehicle (PEG400) or ellagic acid (10 mgkg~' day) intraperitoneally for four 
consecutive days. Mice were then fed ad libidum or fasted for 24h. Liver tissues 
were collected after mice were euthanized. Sample sizes were at least n = 3 to allow 
for statistical analysis. 

Whole-cell lysate preparation and subcellular fractionation. All cells were 
briefly rinsed with ice-cold PBS before collection. For whole-cell lysates, the cells 
were resuspended in RIPA buffer (150 mM NaCl, 1% Triton X-100, 1% sodium 
deoxycholate, 0.1% SDS, 50 mM Tris-HCl (pH 7.5), and 2mM EDTA (pH 8.0)) 
supplemented with protease inhibitors and sonicated using a Branson Sonifier 
450 at output 3 and a duty cycle of 30 for five pulses. For cytosolic and nuclear 
fractions, cells were lysed in harvest buffer (10 mM HEPES (pH 7.9), 50 mM 
NaCl, 0.5 M sucrose, 0.1 mM EDTA, 0.5% Triton X-100 and freshly added DTT, 
PMSF and protease inhibitors), incubated on ice for 5 min and spun at 120g for 
10 min at 4°C. The supernatant (cytosolic fraction) was removed to a separate 
tube. The nuclear pellet was rinsed twice with 500 11 of buffer A (10 mM HEPES 
(pH 7.9), 10mM KCl, 0.1mM EDTA, and 0.1mM EGTA) and spun down at 
120g for 10 min at 4°C. The supernatant was discarded and the pellet (nuclear 
fraction) were resuspended in RIPA buffer and sonicated as for the whole-cell 
lysates. All lysates were quantified by the Bradford method and analysed by 
SDS-PAGE. 

Electron microscopy. Cells were fixed in 0.1 M sodium cacodylate containing 4% 
glutaraldehyde, 1% paraformaldehyde for 1h at room temperature. After washing 
three times with 0.1 M sodium cacodylate, cells were dehydrated through a gradient 
series of ethanol, 20 min each step, starting from 50% ethanol and ending with 100% 
ethanol. Afterwards, cells were incubated with progressively concentrated propylene 
oxide dissolved in ethanol then infiltrated with increasing concentration of Eponate 
812 resin. Samples were baked in a 65°C oven overnight then sectioned using an 
Ultra microtome. Sections were viewed with an energy filtering TEM unit (LEO- 
192AB OMEGA, Carl Zeiss) at the Korean Basic Science Institute, South Korea. 


Immunofluorescence. Immunocytochemistry was performed as previously 
described”®, Cells grown on coverslips at a density of 7 x 10* cells were washed 
three times with PBS and then fixed with 2% paraformaldehyde in PBS for 10 min 
at room temperature. Fixed cells were permeabilized with 0.1% Triton X-100 in 
PBS (PBS-T) for 10 min at room temperature. Blocking was performed with 3% 
bovine serum in PBS-T for 30 min. For staining, cells were incubated with antibod- 
ies for 2h at room temperature, followed by incubation with fluorescent labelled 
secondary antibodies for 1 h (Invitrogen). Cells were mounted and visualized under 
a confocal microscope (Zeiss, LSM700). For autophagy studies, MEFs were trans- 
fected with GFP-LC3 and sub-cultured onto coverslips. The following day, cells 
were incubated with either complete media or glucose starvation media for 18h. 
Cells were treated with rapamycin or ellagic acid for 18h. For BiFC experiments, 
pHA-CARM1-VC155 and pFlag-TFEB-VN173 constructs were used. 
Ubiquitination assay. Ubiquitination assay was performed as previously 
described’’, Cells were transfected with combinations of plasmids including 
HisMax-tagged ubiquitin. After incubation for 48h, cells were treated with 
5g ml! of MG132 for 4h, lysed in buffer A (6 M guanidinium-HCl, 0.1 M 
Na2HPO,4/NaH2POxg, 0.01 M Tris-HCl (pH 8.0), 5mM imidazole, and 10mM 
8-mercaptoethanol), and incubated with Ni?+-NTA beads (QIAGEN) for 4h at 
room temperature. The beads were sequentially washed with buffer A, buffer B 
(8 M urea, 0.1 M Na2PO4/NaH2POu,, 0.01 M Tris-HCl (pH 8.0), and 10 mM B- 
mercaptoethanol), and buffer C (8 M urea, 0.1 M Na2PO4/NaH2POx,, 0.01 M 
Tris-HCl (pH 6.3), and 10mM 8-mercaptoethanol). Bound proteins were eluted 
with buffer D (200 mM imidazole, 0.15 M Tris-HCl (pH 6.7), 30% glycerol, 0.72 M 8- 
mercaptoethanol, and 5% SDS), and subject to immunoblot analysis. Ubiquitination 
site prediction software was used for CARM1 ubiquitination site prediction”. 
Bacterial expression and GST pull-down assay. Glutathione S-transferase (GST)- 
tagged constructs were transformed in Rosetta Escherichia coli and purified with 
glutathione beads (GE Healthcare). 35§-methionine-labelled TFEB deletions or 
CARM1 deletions were generated using TNT Quick Coupled Transcription/ 
Translation system (Promega) according to the manufacturer’s guidance. Purified 
proteins and in vitro translated proteins were diluted in binding buffer (125 mM 
NaCl, 20 mM Tris (pH 7.5), 10% glycerol, 0.1% NP-40, 0.5mM DTT supple- 
mented with protease inhibitors) for GST pull-down experiment. Samples were 
then washed four times with dilution buffer and boiled with SDS sample buffer 
for immunoblotting analysis. 

In vitro kinase assay. GST-SKP2 and beclin (1-148 amino acids) were purified 
using glutathione bead and eluted in elution buffer (50 mM Tris-HCl (pH 8.0), 
100 mM NaCl, 10 mM t-glutathione reduced (Sigma)). HA-AMPKal constitu- 
tively active (CA) was co-transfected in HEK293T cells with Flag-AMPK8 and 
HA-AMPK%, and the complex was immunoprecipitated using Flag-M2 beads 
(Sigma) and eluted through 3 x -Flag peptide in elution buffer (0.1 mgm! in TBS). 
Then 11g of each substrate was reacted with AMPK complexes in kinase reaction 
buffer containing 20 mM HEPES (pH 7.4), 5mM MgCh, 1mM EGTA, 0.4mM 
EDTA and 0.05 mM DTT, as previously described”. Reactions were incubated 
with 150|1M AMP and 2.Ci of radiolabelled [-*?P]ATP at 30°C for 15 min. The 
reactions were terminated by adding SDS sampling buffer, and phosphorylation 
was detected by SDS-PAGE and autoradiography. 

Construction of reporter plasmids and luciferase assays. The Skp2 promoter 
region (from 1 kb upstream of transcription start site to 200 bp downstream) and 
2x CLEAR (GTCACGTGACCCCAGGGTCACGTGAC) sequence (under- 
lined bases denote the known sequence of the CLEAR element) were cloned into 
pGL2-luciferase reporter vector (Promega). FOXO response element (FRE) mutant 
at the Skp2 promoter was constructed by site-directed mutagenesis. MEFs were 
transiently transfected with luciferase reporter plasmids and luciferase activity was 
measured 36h after transfection and normalized by }-galactosidase expression. 
qRT-PCR. Total RNAs were extracted using Trizol (Invitrogen) and reverse 
transcription was performed from 2.5 1g total RNAs using the M-MLV cDNA 
Synthesis kit (Enzynomics). The abundance of mRNA was detected by an ABI 
prism 7500 system or BioRad CFX384 with SYBR TOPreal qPCR 2x PreMix 
(Enzynomics). The quantity of mRNA was calculated using the AAC, method 
and Hprt, Gapdh and Actb were used as controls. mRNA levels from mouse liver 
tissues were normalized by the 36B4 (also known as Rp/p0) gene. All reactions 
were performed as triplicates. 

The following mouse primers were used in this study. Actb; forward (fwd) 
5'-TAGCCATCCAGGCTGTGCTG-3’, reverse (rev) 5/-CAGGATCTTC 
ATGAGGTAGTC-3’; Gapdh; fwd 5’-CATGGCCTTCCGTGTTCCTA-3’, rev 
5'-CCTGCTTCACCACCTTCTTG A-3’; Hprt; fwd 5’-GCTGGTGAAAA 
GGACCTCTCG-3’, rev 5/-CCACAGGACTAGAACACCTGC-3’; 36B4; fwd 5’-CA 
ACCCAGCTCTGGAGAAAC-3’, rev 5‘-CCAACAGCATATCCCGAATC-3’; Ulk1; 
fwd 5’-GCTCCGGTGACTTACAAAGCTG-3’, rev 5’-GCTGACTCCAAG 
CCAAAGCA-3’; Map 1Ic3b; fwd 5'‘-CACTGCTCTGTCTTGTGTAGGTTG-3’, 
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rev 5‘-TCGTTGTGCCTTTATTAGTGCATC-3’; Atg12; fwd 5’-TCCGT 
GCCATCACATACACA-3’, rev 5/-TAAGACTGCTGTGGGGCTGA-3'; Atg13; 
fwd 5’-CCAGGCTCGACTTGGAGAAAA-3, rev 5'-AGATTTCCAC 
ACACATAGATCGC-3’; Atg14; fwd 5‘-AGCGGTGATTTCGTCTATTTCG-3’, 
rev 5’-GCTGTTCAATCCTCATCTTGCAT-3’; Sirtl; fwd 5’-GATACCTT 
GGAGCAGGTTGC-3’, rev 5’-CTCCACGAACAGCTTCACAA-3’ ; Sqstm1; 
fwd 5’- ATGTGGAACATGGAGGGAAGA-3’, rev 5/-GGAGTTCACCT 
GTAGATGGGT-3’; Vps11; fwd 5‘-AAAAGAGAGACGGTGGCAATC-3’, rev 
5'-AGCCCAGTAACGGGATAGTTG-3’; Atp6vicl; fwd 5’-ACTGAGTT 
CTGGCTCATATCTGC-3’, rev 5’-TGGAAGAGACGGCAAGATTATTG-3’; 
Hexb; fwd 5‘-CTGGTGTCGCTAGTGTCGC-3’, rev 5‘-CAGGGCCATGAT 
GTCTCTTGT-3’; Neul; fwd 5‘-GGACCGCTGAGCTATTGGG-3’, 
rev 5’-CGGGATGCGGAAAGTGTCTA-3’; Mcoln1; fwd 5’-CTGACCC 
CCAATCCTGGGTAT-3’, rev 5’‘-GGCCCGGAACTTGTCACAT-3’; Ctns; 
fwd 5’-ATGAGGAGGAATTGGCTGCTT-3’, rev 5’-ACGTTGGTTGAA 
CTGCCATTTT-3’; Hspa5; fwd 5'‘-ACTTGGGGACCACCTATTCCT-3’, rev 
5’-ATCGCCAATCAGACGCTCC-3’; Skp2; fwd 5’-CCTCCAAGGAA 
ACGAGTCAAG-3’, rev 5’-CAGGAGACACCTGGAAAGTTC-3’, Tfeb; 
fwd 5'‘-AAGGTTCGGGAGTATCTGTCTG-3’, rev 5’-GGGTTGGAGCTG 
ATATGTAGCA-3’; Tfe3; fwd 5‘-TGCGTCAGCAGCTTATGAGG-3’, rev 5/-AG 
ACACGCCAATCACAGAGAT-3’, Ampkal; fwd 5’-GTCAAAGCCGACC 
CAATGATA-3’, rev 5’/-CGTACACGCAAATAATAGGGGTT-3’; Ampka2; fwd 
5'-CAGGCCATAAAGTGGCAGTTA-3’, rev 5‘-AAAAGTCTGTCGGAG 
TGCTGA-3’. 

The following human primers were used in this study. ACTB; fwd 5'-AT 
TGCCGACAGGATGCAGAA-3’, rev 5/-ACATCTGCTGGAAGGTGGACAG-3'; 
GAPDH; fwd 5‘-CGACCACTTTGTCAAGCTCA-3’, rev 5‘-AGGGGAGA 
TTCAGTGTGGTG-3’; HPRT; fwd 5’-TGACACTGGCAAAACAATGCA-3’, 
rev 5’‘-GGTCCTTTTCACCAGCAAGCT-3’; SKP2; fwd 5’-ATGCCCCAAT 
CTTGTCCATCT-3° rev 5’-CACCGACTGAGTGATAGGTGT-3’; AMPKAI; 
fwd 5’-TTTGCGTGTACGAAGGAAGAAT-3’, rev 5’-CTCTGTGGA 
GTAGCAGTCCCT-3’; AMPKA2; fwd 5’‘-CTGTAAGCATGGACGGGTTGA-3’, 
rev 5‘-AAATCGGCTATCTTGGCAT TCA-3’. 

RNA-seq and ChIP-seq analyses. The TruSeq method was used to generate 
RNA-seq libraries. ChIP-seq libraries were prepared using the NEXTflex ChIP-seq 
kit (Bioo Scientific), according to the manufacturer's instructions. RNA-seq librar- 
ies were pair-end sequenced and ChIP-seq libraries were single-end sequenced on 
an Illumina Hi-seq 2500 (NICEM, Seoul National University). All the RNA-seq 
data were mapped using Tophat package”? against the mouse genome (mm9). 
Differential analysis has been done via EdgeR package*’. Differentially regulated 
genes were identified using a false discovery rate (FDR) cut-off of 1 x 10~° for 
knockout against knockout-glucose starvation (KO-GS), wild type against wild- 
type-glucose starvation (WT-GS), wild type against knockout, and WT-GS against 
KO-GS. We did hierarchical clustering analysis using the gene expression val- 
ues from all conditions and replicates for previously selected differential genes. 
Specifically, we used Ward's criterion for genes with 1 — (correlation coefficient) 
as a distance measure. Clustering heatmap was drawn using z-score that is scaled 
across samples for each gene. ChIP-seq data were mapped to the mouse genome 
using Bowtie. The tracks were generated using uniquely aligned reads. At promot- 
ers, genes were sorted based on the expression levels, indicating that H3R17me2 
as well as H3K4me3 were enriched at active promoters. We used 8,398 distal 
(<2.5kb from annotated TSSs) CBP and MED12 binding sites for enhancers, 
which were sorted based on H3K27ac levels. H3R17me2 was not detected at 
enhancers. The data on H3R17me2, H3K4mel, H3K4me3 and H3K27ac were 
obtained from MEFs under normal conditions. 

ChIP, two-step ChIP assays, and qRT-PCR analyses. The ChIP and sequen- 
tial two-step ChIP assays were conducted as previously described*”. In brief, 
cells were crosslinked with 1% formaldehyde for 10 min at room temperature. 
After glycine quenching, the cell pellets were lysed in buffer containing 50 mM 
Tris-HCl (pH 8.1), 10 mM EDTA, 1% SDS, supplemented with complete protease 
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inhibitor cocktail (Roche), and sonicated. Chromatin extracts containing DNA 
fragments with an average of 250 bp were then diluted ten times with dilution 
buffer containing 1% Triton X-100, 2mM EDTA, 150mM NaCl and 20mM 
Tris-HCl (pH 8.1) with complete protease inhibitor cocktail, pre-cleared with 
protein A/G sepharose and subjected to immunoprecipitations overnight at 4°C. 
Immunocomplexes were captured by incubating 4511 of protein A/G sepha- 
rose for 2 h at 4°C. Beads were washed with low-salt wash buffer (0.1% SDS, 
1% Triton X-100, 2mM EDTA, 20mM Tris-HCl (pH 8.1), 150mM NaCl), high- 
salt wash buffer (0.1% SDS, 1% Triton X-100, 2mM EDTA, 20 mM Tris-HCl 
(pH 8.1), 500 mM NaCl), buffer III (0.25 M LiCl, 1% NP-40, 1% deoxycholate, 
10 mM Tris-HCl (pH 8.1), 1mM EDTA), TE buffer (10 mM Tris-HCl (pH 8.0), 
0.5M EDTA) and eluted in elution buffer (1% SDS, 0.1 M NaHCOs). The super- 
natant was incubated overnight at 65°C to reverse-crosslink, and then digested 
with RNase A for 2h at 37°C and proteinase K for 2h at 55°C. ChIP and input 
DNA were then purified and analysed for qRT-PCR analysis or used for con- 
structing sequencing libraries. For the two-step ChIP assays, components were 
eluted from the first immunoprecipitation reaction by incubation with 10 mM 
DTT at 37°C for 30 min and diluted 1:50 in ChIP dilution buffer followed by 
re-immunoprecipitation with the second antibodies. Two-step ChIP assay was 
performed in essentially the same way as the first immunoprecipitation. qPCR 
was used to measure enrichment of bound DNA, and the value of enrichment 
was calculated relative to input and the ratio to IgG. All reactions were performed 
in triplicates. The following primers were used in ChIP assays. Skp2 (FRE); 
fwd 5’-CCTTAGGACTGGGTCTGTGG-3’, rev 5’-GCACGCTGATTTG 
ATCTTCA-3'; Map 1Ic3b; fwd 5’-AGCCAGTGGGATATTGGTCT-3’, rev 5’-AG 
AGCCTGCGGTACCCTAC-3'; Atg14; fwd 5’-GAGACGCCATGATGATCTGA-3’, 
rev 5‘-GCCAAGGAGTGTGGGAAGTA-3’; Atp6vicl1; fwd 5'’-ACTCAGTGG 
CAGAAGGGAGA-3’, rev 5‘/-AAACACCCAGTGGAGACT GC-3'; Hexb; fwd 
5!-GAATTGGGACTGTGGTCGAT-3’, rev 5’-CTAGTGTCGCTGGCCCTA GT-3’; 
Hspa5; fwd 5'-ATTGGTGGCCGTTAAGAATG-3’, rev 5’-TGAAGTCGCTACT 
CGTT GGA-3’; Ctns; fwd 5‘-CCTCTGGTAGCGTAGGT-3’, rev 5/-GCTTTT 
GGTGAGGTCTGTCC-3’; Vps11; fwd 5‘-GGGCCGATCTTAACCTTTGT-3’, 
rev 5/-AGCCCAGATGTCTTTTGTGG-3’; Neu1; fwd 5’-AGGATGACTT 
CAGCCTGGTG-3’, rev 5’/-AGGATAGTATGGGCCGAACC-3’; Mcoln1; fwd 
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Extended Data Figure 1 | Increased H3R17me2 by CARMI in amino 
acid starvation-induced autophagy. a, b, Immunoblot analysis of various 
histone marks in response to amino acid (AA) starvation or rapamycin 
(100 nM). c, Immunoblot analysis of CARM1 and LC3 conversion 
(LC3-II). d, Amino acid-starved wild-type, Carm1 knockout or knock-in 


MEFs were analysed by immunoblot. e, Representative confocal images 
of GFP-LC3 puncta formation. GFP-LC3 (green); DAPI (blue). Scale 
bar, 201m. The graph shows quantification of LC3-positive punctate 
cells (right). Bars, mean + s.e.m.; 1 =5, with over 100 cells. **P<0.01 
(one-tailed t-test). 
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Extended Data Figure 2 | Loss of CARM1 and inhibition of H3R17me2 
impair autophagy. a, LC3 flux was analysed in MEFs infected with 
nonspecific shRNA (shNS) or CARM1 shRNAs (shCARM1-1 and -2). 
Bafilomycin Al (BafA1; 200 nM, 2h). The LC3-II/LC3-I ratio is indicated. 
b, LC3 flux was analysed in wild-type and Carm1 knockout MEFs in 

the absence or presence of Bafilomycin Al. The LC3-II/LC3-I ratio is 
indicated. c, mCherry-GFP-LC3 was transfected in wild-type and Carm1 


knockout MEFs and the formation of autophagosome (mCherry-positive; 
GFP-positive) and autolysosome (mCherry-positive; GFP-negative) 

was examined. Scale bar, 201m. d, Immunoblot analysis in MEFs. 

e, Representative confocal images of GFP-LC3 puncta formation. 

Scale bar, 101m. Bars, mean + s.e.m.; = 5, over 150 cells. *P < 0.05 
(one-tailed t-test). f, Immunoblot analysis in MEFs. 
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Extended Data Figure 3 | CARM1 is degraded by SKP2-containing 
SCF E3 ligase in the nucleus. a, Wild-type CARM1 and ubiquitination- 
defective mutant K471R were analysed for their expression in MEFs after 
MG132 treatment. b, Interaction between CARM1 and CUL proteins was 
analysed. c, Lysates were analysed by immunoblot. d, Left, HepG2 cells 
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infected with two different SKP2 shRNAs were subject to cycloheximide 
(CHX) experiment. Right, protein half-life of CARM1 was quantitatively 
defined (right). e, Left, CHX experiment in HepG2 expressing wild-type 
SKP2 or AF mutant. Right, protein half-life of CARM1 was quantitatively 
defined. Data are mean +s.e.m.; n= 3. **P <0.01 (one-tailed t-test) (d, e). 
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Extended Data Figure 4 | CARM1 is degraded by CUL1-containing 
SCF E3 ligase in the nucleus under nutrient-rich condition. a, HepG2 
cells transfected with Flag-CUL1 were deprived of glucose for 18 h and 
treated with MG132 before collecting. Interaction between CARM1 
and CULI was analysed. b, ¢, In vivo ubiquitination assay of CARM1 
after knockdown of CUL1 (b) or overexpression of wild-type or K720R 
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mutant (MT) CULI (c). d, e, Left, HepG2 cells infected with two different 
CULI shRNAs (d) or overexpressing wild-type or mutant CULI (e) were 
subject to cycloheximide treatment. Right, protein half-life of CARM1 
was quantitatively defined. Data are mean + s.e.m.; n=3. *P < 0.05, 
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Extended Data Figure 5 | AMPKa2 accumulates in the nucleus leading 
to repression of SKP2 and stabilization of CARM1 under nutrient- 
starved conditions. a, b, GRT-PCR of Ampkal and Ampka2 in MEFs (a) 
and HepG2 cells (b) upon glucose starvation. c, The nuclear AMPKa2 
expression level was analysed in the absence or presence of MG132. 


d, Binding between CARM1 and AMPK was assessed. e, 


kinase assay with AMPK. f, MEFs were treated with AICAR (1 mM) 


or phenformin (2 mM) for 4h. The nuclear fraction was 


by immunoblot. g, MEFs were deprived of glucose in the absence or 
presence of 10 1.M compound C and the nuclear fraction was analysed 


In vitro 


analysed 


by immunoblot. h, Left, cycloheximide treatment in wild-type and Ampk 
DKO MEFs. Right, protein half-life of CARM1 was quantitatively defined. 
i, j, Ampk DKO MEF lysates were analysed by immunoblot. k, CARM1- 
CULI interaction was analysed after SKP2 knockdown in wild-type and 
Ampk DKO MEFs. 1, SKP2 expression levels were analysed in the absence 


or presence of MG132. m, Foxo1/3/4 J MEFs infected with Cre virus were 


analysed for Skp2 mRNA. n, SKP2 and phosphorylated FOXO3a were 


analysed by immunoblot. o, ChIP assay of the Skp2 promoter. Data are 


mean +s.e.m.;n=3. *P< 0.05, **P< 0.01 (one-tailed t-test) (a, b, h, m, o). 
p, Representative confocal images. Scale bar, 20 jm. 
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Extended Data Figure 6 | Identification of CARMI target genes by genes. Data are mean +s.e.m.; n= 3. *P < 0.05, **P< 0.01 (one-tailed 
RNA-seq and ChIP-seq analyses. a, Flow chart showing the strategy t-test). f, Enrichment of H3R17me2 at promoters (left) and enhancers 
of RNA-seq analysis. b, Hierarchical clustering results applied to (right). The data on H3R17me2, H3K4mel, H3K4me3 and H3K27ac 
4,998 differentially expressed genes (DEGs). c, Autophagy-related and were obtained from MEFs under normal condition. g, Increase in 
lysosomal genes significantly observed in cluster 1. Hyper-geometric H3R17mez2 at promoters of genes from cluster 1 after glucose starvation. 
P values were calculated. d, Genes from cluster 1 were analysed for h, Increased H3R17mez? levels in response to 18h of glucose starvation at 
transcription factor (TF) motif enrichment at their promoter region the autophagy-related gene Map 1Ic3b. The direction of transcription is 
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analysis of CARM1-dependent autophagy-related and lysosomal 
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Extended Data Figure 7 | Binding mapping of CARM1 and TFEB MD, methyltransferase domain; TA, transcription activation domain. 

and their target gene regulation in glucose starvation. a, Bimolecular e, Endogenous co-immunoprecipitation from nuclear fraction of wild-type 
fluorescence complementation (BiFC) analysis of the CARM1-TFEB MEFs. f, g, RT-PCR analysis in MEFs after knockdown of TFEB or TFE3. 
interaction. Scale bar, 20 um. b, Interaction between CARM1 and TFEB h, i, RT-PCR analysis showing mRNA levels of TFEB-dependent and 


was analysed in wild-type and Ampk DKO MEFs after glucose starvation. CARM1-dependent genes after knockdown of TFEB (h) or CARMI (i). 
c, d, In vitro GST pull-down assays for domain mapping of CARM1-TFEB Bars, mean +s.e.m.; 1 = 3. *P < 0.05, **P< 0.01 (one-tailed t-test) (f-i). 
interaction. BHLH, basic helix-loop-helix; LZ: leucine zipper. 
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Extended Data Figure 8 | CARM1 functions as a co-activator of TFEB. genes or TFEB-dependent, CARM1-independent target genes in MEFs 
a, ChIP assays on TFEB-dependent, CARM1-dependent promoters after 18 h of glucose starvation. The chromatin fractions were first subject 
after knockdown of CARM1. b, ChIP assays of the Hspa5 promoter, to pull-down with anti-TFEB antibody, eluted from immunocomplexes 
a TFEB-dependent, CARM1-independent target promoter. c, MEFs and applied for the second pull-down with control IgG or anti-CARM1 
were analysed with indicated antibodies. d, Two-step ChIP assays were antibody. Bars, mean + s.e.m.; n = 3 (a, b, d). e, Representative confocal 


performed on promoters of TFEB-dependent, CARM1-dependent target images. Scale bar, 10,1m. 
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Extended Data Figure 9 | A subset of autophagy-related and lysosomal 
genes regulated by TFEB requires CARM1. a, qRT-PCR analysis 
showing mRNA levels of TFEB-dependent and CARM1-dependent 
autophagy-related and lysosomal genes in wild-type and Ampk DKO MEFs 
in response to glucose starvation. b, ChIP assays on TFEB-dependent, 
CARM1-dependent target genes in wild-type and Ampk DKO MEFs. 

c, (RT-PCR analysis of CARM1-dependent genes after knockdown of 


SKP2 in Ampk DKO MEFs. d, qRT-PCR analysis was performed in 
MEFs deprived of glucose in the absence or presence of H3R17me2- 
specific inhibitor, ellagic acid. e, f, ChIP assays on TFEB-dependent, 
CARM1-dependent promoters. Hspa5 promoter was also analysed as a 
CARM1-independent promoter. Bars, mean + s.e.m.; n= 3. *P< 0.05, 
** P< 0.01, ***P < 0.001 (one-tailed t-test) (a-f). 
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Extended Data Figure 10 | Graphical summary of the AMPK-SKP2- degrades CARM1 under nutrient-rich conditions, but in nutrient-deprived 
CARMI signalling cascade. Proposed model depicting the AMPK-SKP2- conditions, AMPK-dependent phosphorylation of FOXO3a downregulates 
CARMI signalling axis in the transcriptional and epigenetic regulation SKP2 and stabilizes CARM1, which in turn functions as a co-activator of 
of autophagy. The SKP2-containing SCF E3 ubiquitin ligase complex TFEB in regulation of autophagy. 
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Rocaglates convert DEAD- 


box protein eIF4A into a 


sequence-selective translational repressor 


Shintaro Iwasaki!, Stephen N. Floor! & Nicholas T. Ingolia! 


Rocaglamide A (RocA) typifies a class of protein synthesis inhibitors 
that selectively kill aneuploid tumour cells and repress translation 
of specific messenger RNAs! *. RocA targets eukaryotic initiation 
factor 4A (eIF4A), an ATP-dependent DEAD-box RNA helicase; its 
messenger RNA selectivity is proposed to reflect highly structured 
5’ untranslated regions that depend strongly on e[F4A-mediated 
unwinding®. However, rocaglate treatment may not phenocopy the 
loss of eIF4A activity, as these drugs actually increase the affinity 
between eIF4A and RNA!*, Here we show that secondary structure 
in 5’ untranslated regions is only a minor determinant for RocA 
selectivity and that RocA does not repress translation by reducing 
eIF4A availability. Rather, in vitro and in cells, RocA specifically 
clamps eIF4A onto polypurine sequences in an ATP-independent 
manner. This artificially clamped eIlF4A blocks 43S scanning, 
leading to premature, upstream translation initiation and reducing 
protein expression from transcripts bearing the RocA-elF4A target 
sequence. In elucidating the mechanism of selective translation 
repression by this lead anti-cancer compound, we provide an 
example of a drug stabilizing sequence-selective RNA-protein 
interactions. 

We analysed the global translational inhibition caused by RocA, as 
well as its marked messenger RNA (mRNA) selectivity, using ribosome 
profiling’. RocA treatment of HEK 293 cells caused a dose-dependent 
decrease in polysome formation and protein synthesis (Extended Data 
Figs 1a and 2a). Translation was inhibited without 4EBP dephosphoryl- 
ation or eI[F2« phosphorylation (Extended Data Fig. 1b), but partly res- 
cued by expression of RocA-resistant eIF4A proteins® (Extended Data 
Fig. 1c, d). We quantified the reduction in overall cytosolic ribosome 
footprints after normalization of our ribosome profiling data against 
footprints from the mitochondrial ribosome’®, which employs molec- 
ular machinery distinct from the cytoplasmic translation apparatus 
(Fig. la and Extended Data Fig. le-h). We saw that RocA sensitivity 
varied widely across different transcripts (Fig. 1a, b and Supplementary 
Table 1a, b). This mRNA-specific translational repression occurred even 
at a low, therapeutically relevant concentration of RocA (30 nM)!*4, 
correlated well between different drug concentrations, and was not 
accompanied by significant changes in mRNA abundance (Extended 
Data Fig. 2b-d and Supplementary Table 1c). 

Given that eIF4A acts during the scanning of the pre-initiation 
43S complex along the 5’ untranslated region (UTR)’, we reasoned 
that the varied RocA sensitivity of different mRNAs might be deter- 
mined by their 5’ UTR sequences. We confirmed that the 5’ UTRs of 
selected mRNAs were sufficient to confer RocA sensitivity on a Renilla 
luciferase reporter, while the scanning-independent HCV IRES'® 
was totally resistant to the drug (Fig. 1c and Extended Data Fig. 2e). 
However, RocA sensitivity did not reflect either the calculated thermo- 
dynamic stability or experimentally derived DMS-Seq secondary 
structure measurement!! of the 5’ UTR, and the presence of predicted 
G-quadruplexes® contributed only modestly (Extended Data Fig. 3). 

Because RocA enhances the RNA affinity of eIF4A'**, we suspected 
that it could induce effects beyond the simple loss of eIF4A activity. 


Indeed, we found that the eIF4A inhibitor hippuristanol (Hipp), which 
decreases the affinity between eIF4A and RNA!"3, yields a different 
spectrum of mRNA-specific repression (Extended Data Fig. 4a-e). The 
mTOR inhibitor PP242, which inhibits formation of eIF4F (a complex 
of eIF4E/G/A)!*"5, represses a subset of these Hipp-sensitive mRNAs 
(Extended Data Fig. 4f, g). Thus, RocA exerts effects beyond reduced 
eIF4A activity, particularly at low, therapeutic doses. 

We next asked how RocA affected elF4A occupancy across the 
transcriptome in cells by sequencing transcripts that co-purified with 
streptavidin binding peptide (SBP)-tagged eIF4A (Extended Data 
Fig. 5) (RNA-immunoprecipitation sequencing (RIP-seq)). Increasing 


a b SC Fs Low-sensitivity 
n=4564 RocA0.03 uM 2 ae mRNAs, n = 595 
® 600 0.756 fold @ 0 cope 3 a ‘ 
a ac 8 oe Estes 
= 400 RocA 0.3 1M 2 2 5 
6 0.349 fold $s 
. io 2 * sees 
8 200} RocA3 uM TS 4 ee see 
E 0.226 fold a | cede ae e 
=} D High-sensitivity «, 
Z oe- —-- 4 2 | mRNAs, =721°%, 
-6 -4 -2 0 2 -6 
log, (translation change 50 200° '500'. *2,000:5;000 


Mean expression 


normalized to mitochondrial footprints) 


c High-sensitivity Low-sensitivity Scanning-independent 


HCV IRES 
oO 1 1 1 
2 0.01 0.1 1 
S [  EIF2S3 ~ 4 
g 08 22 nt d ¢ 
c L = 3 
= @ 
§ o4} sé 
e +t a 
2 0 
5 0 1 1 L ov 4 
@ _— HNRNPC Be 4 
0.87 244 nt 5 es 
L L c, 
0.4L L > 
L L =-2 
(yy ee ea 6 -4 -2 0 2 
0.01 0.1 1 0.01 0.1 4 log,(translation change) 
RocA (uM) RocA 0.03 uM 


Figure 1 | RNA sequence selectivity is imparted upon eIF4A by RocA 
causing selective translation repression. a, Histogram of the number 

of transcripts along translation -fold change by ribosome profiling when 
cells are treated with 0.03, 0.3, or 311M RocA, normalized to the number of 
mitochondrial footprints. Median -fold change is shown. Bin width is 0.1. 
b, MA plot of mean footprint reads between 3 1M RocA treatment and 
non-treatment normalized to library sizes versus translation -fold change 
by 31M RocA treatment, highlighting high- and low-sensitivity mRNAs. 
c, The 5’ UTRs of indicated genes were fused to Renilla luciferase and 
these reporter mRNAs were transfected before treatment with RocA 

as indicated. Data represent mean and s.d (n =3). d, Correlation of 
translation -fold change to RIP -fold change with RocA treatment. 

p, Spearman’s rank correlation. 
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Figure 2 | RNA Bind-n-Seq and iCLIP reveal that RocA preferentially 
increases the affinity between eIF4A and polypurine motif. 

a, Correlations between tetramer motif enrichment in Bind-n-Seq by 0.03 14M 
RocA treatment and motif prediction of 0.03 1M RocA effect in RIP-seq. 
b, Highest-scoring elements in Bind-n-Seq and RIP-seq. c, The change in 
mRNA binding for mRNAs with or without the enriched tetramer motif 
(b) in their 5’ UTRs is shown as the RIP -fold change by RocA normalized 
to spike-in RNA. Significance is calculated by Mann-Whitney U-test. 

d, Enrichment of tetramer motifs (b) in iCLIP by RocA treatment relative 
to control dimethylsulfoxide (DMSO) treatment. e, The frequency of the 
tetramer motif (b) in the 5’ UTR predicts whether a mRNA is high- or 
low-sensitivity, on the basis of the difference in cumulative distributions of 
motifs in the 5’ UTR. Significance is calculated by Mann-Whitney U-test. 
f, Reporter assay in HEK 293 cells with a CAA-repeat 5’ UTR containing 
seven polypurine motif (AGAGAG) insertions (Extended Data Fig. 9a). 
Data represent mean and s.d. (n= 3). 


RocA doses elevated the overall amount of RNA that co-purified with 
SBP-tagged eIF4A (Extended Data Fig. 5d), and greatly changed the 
abundance of individual transcripts, leading to 15-fold or larger dif- 
ferences between mRNAs. Strikingly, enhanced eIF4A binding in the 
presence of RocA correlated strongly with translation inhibition by 
RocA (Fig. 1d and Extended Data Fig. 5f), suggesting that a selective 
increase of the e[F4A-RNA affinity underlies the specific translation 
inhibition caused by RocA. 

This mRNA selectivity led us to explore the sequence preferences of 
eIF4A in the absence and presence of RocA. We measured the RNAs 
that bound to eIF4A out of a random pool of oligonucleotides using 
deep sequencing (RNA Bind-n-Seq)!° (Extended Data Fig. 6a—c). We 
then calculated the enrichment of 4- to 6-nucleotide (nt) motifs in 
RNAs retained on eIF4A, as DEAD-box RNA helicases typically contact 
6 nt (ref. 17). The motifs enriched from randomized synthetic RNA by 
Bind-n-Seq also predicted RIP-seq enrichments of endogenous tran- 
scripts (Fig. 2a and Extended Data Fig. 6d). In both experiments, RocA 
greatly enhanced binding to short polypurine sequences (Fig. 2b, c and 
Extended Data Fig. 6e). Although drug-free eIF4A also had intrinsic 
RNA sequence preferences'® (Extended Data Figs 6g) and transcripts 
containing these preferred sequences were relatively resistant to Hipp 
treatment (Extended Data Fig. 6h), RocA only selectively increases 
binding to a subset of sequences containing polypurine stretches 
(Extended Data Fig. 6g). 

Polypurine motifs were also enriched in the eIF4A binding sites 
detected by photocrosslinking and immunoprecipitation (iCLIP)!? 
after RocA treatment (Fig. 2d and Extended Data Fig. 7), and in the 
5’ UTRs of translationally RocA-sensitive mRNAs (Fig. 2e). This striking 
correspondence among in vitro binding to recombinant protein, ex vivo 
co-purification, crosslinking in cells, and translational repression in 
cells led us to hypothesize that selective binding to polypurine motifs 
induced by RocA binding could explain mRNA-specific translational 
repression. We then directly confirmed that inserting the polypu- 
rine motif into an unstructured CAA repeat 5’ UTR (Extended Data 
Fig. 9a)”° sensitized the reporter to RocA inhibiton (Fig. 2f). 
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Figure 3 | RocA clamps eIF4A on polypurine motif even after ATP 
hydrolysis. a, b, Direct measurement of the eIF4A/RNA affinity by 
fluorescence polarization for e[F4A and 5’ FAM-labelled RNAs in the 
presence or absence of RocA. Data represent mean and s.d. (n = 3). 

c, Motif enrichments along entire tetramer motifs in Bind-n-Seq with 
ADP + Pi and highest-scoring elements (inset). d, Competition assay with 
unlabelled RNA. Data represent mean (n = 3). e, Ribosome toeprinting 
assay performed in RRL in the presence of GMP-PNP in the presence or 
absence of 341M RocA treatment. f, Relative RNase I cleavage protected by 
eIF4A/RocA complex on mRNA containg one AGAGAG at the middle in 
footprinting assay. See the original data in Extended Data Fig. 9g. 


We found that RocA-induced, sequence-selective eIF4A binding 
occurs through ATP-independent clamping that suffices to repress 
translation of the clamped mRNA. The cycle of ATP-dependent RNA 
binding and subsequent release upon ATP hydrolysis is necessary for 
the efficient RNA remodelling activity of e[F4A as well as its role in 
translation*!. Drug-free e[F4A bound RNA only in the presence of ATP 
(AMP-PNP and ADP-BeF,) and transition state (ADP-AIF4) analogues 
but not hydrolysis products (ADP + Pi). Remarkably, RocA clamped 
eIF4A on polypurine RNA, but not CAA-repeat RNA, in an ATP- 
independent manner (Fig. 3a, b and Extended Data Fig. 8a, b). Bind- 
n-Seq performed with ADP + Pi likewise recovered polypurine- 
enriched RNAs in the presence of RocA and no detectable RNA in the 
absence of RocA (Fig. 3c and Extended Data Fig. 6i). RocA provided 
polypurine-specific RNA binding activity to mutant eIF4A defective for 
ATP binding (VX4GKT)”, which does not bind to RNA at all without 
RocA (Extended Data Fig. 8d-f), and even to the truncated amino 
(N)-terminal domain of elF4A, albeit with lower affinity (Extended 
Data Fig. 8g). The e[F4A/RocA complex dissociated far more slowly 
from polypurine RNA than naive eIF4A, even in the presence of ATP, 
whose hydrolysis ordinarily permits rapid dissociation (Fig. 3d). High 
RNA affinity in the ADP-bound state can prolong RNA binding beyond 
the time required for adenosine nucleotide exchange to restore the 
high-affinity ATP-bound state and thus greatly reduce the effective 
dissociation rate. This effective dissociation rate from polypurine RNA 
measured in hydrolysable ATP (reflective of the intracellular environ- 
ment) becomes much slower than the ~1 min timescale of translation 
initiation”’, and could serve to directly block the ribosome. 

To probe how clamped elF4A repressed translation, we recapitu- 
lated RocA-induced, polypurine motif-specific translational repression 
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Figure 4 | e[F4A/RocA complexes on polypurine motifs block scanning 
of pre-initiation complex, inducing uORF translation. a, Pre-formation 
of the complex with RocA and eIF4A on the mRNA bearing seven 
polypurine motifs represses the translation from the mRNA in RRL. 

b, The supplementation of recombinant eIF4A protein to RRL in vitro 
transaltion reaction with 10,1M Hipp or 341M RocA. ¢, In vitro translation 
in RRL with mRNAs with native polio virus IRES and that with three 
polypurine motifs (Extended Data Fig. 9a). d, Meta-gene analysis of 
high-sensitivity transcripts to RocA. Reads are normalized to the sum 

of mitochondrial footprints reads. Histogram of the position of the first 
polypurine motif (hexamer) after uORF initiation codon (inset). P value is 


in rabbit reticulocyte lysate (RRL) (Extended Data Fig. 9b, c). In this 
system, RocA treatment represses the formation of 48S pre-initiation 
complexes on the start codon of sensitive mRNAs, which we assessed 
using a primer extension toeprinting assay*”* (Fig. 3e and Extended 
Data Fig. 9d). Surprisingly, we observed additional RocA-dependent 
toeprints on the 5’ UTR, corresponding to the position of polypurine 
motifs (Fig. 3e), even without eIF4F recruitment (Extended Data 
Fig. 9e). We recapitulated these toeprints using only purified elF4A 
and drug, showing that they reflect eIF4A/RocA complexes clamped 
directly onto polypurine motifs, bypassing its canonical recruitment 
via cap and the eIF4F complex”! (Extended Data Fig. 9f). RNase I foot- 
printing revealed the full extent of the eIF4A protected region centred 
on the motif (Fig. 3f and Extended Data Fig. 9g). 

These e[F4F-independent eIF4A/RocA complexes directly repress 
translation. We pre-formed such stable complexes on an mRNA during 
a pre-incubation with recombinant elF4A and RocA, and then showed 
that they repressed its subsequent translation in the absence of free 
RocA (Fig. 4a). Recombinant forms of eIF4A bearing mutations that 
disrupt either ATP binding or elF4G binding still retained the ability 
to clamp onto polypurine RNA in the presence of RocA (Extended 
Data Figs 8d-f, h, iand 9h-i) and repress translation from the RNA as 
strongly as wild-type elF4A/RocA complex (Extended Data Fig. 9j). 
Furthermore, supplementation of recombinant eIF4A protein into an 
in vitro translation reaction actually strengthened the repressive effect 
of RocA (Fig. 4b and Extended Data Fig. 9k), confirming the dominant 
repressive effect of the eIF4A/RocA complex. In contrast, translation 
repression by Hipp, which decreases the affinity between elF4A and 
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calculated by Fisher’s exact test. Bin width is 12 nt. e, Western blot of SBP 
translated from uORF and downstream major ORF in RRL with 0.03 1M 
RocA treatment. Quantification of bands normalized to long form with 
DMSO treatment is shown. For gel source data, see Supplementary Fig. 1. 
f, Schematic representation of RocA-mediated translation control. 

RocA clamps eIF4A onto mRNA by selective affinity enhancement for a 
polypurine motif in eIF4F-, cap-, and ATP-independent manners, which 
then blocks scanning of pre-initiation complex, introducing premature 
translation from uORF and inhibiting downstream ORF translation. 

In b and c, data represent mean and s.d. (n= 3). 


RNA and thereby mimics a loss of its function, was relieved by the 
addition of recombinant eIF4A. 

Assembly of an e[F4A/RocA complex could in principle repress 
48S formation by blocking 40S attachment to the 5’ end of an mRNA 
or subsequent 43S scanning along the 5’ UTR. Because the impact 
of e[F4A/RocA bound to a single polypurine motif is unaffected by 
its distance from the 5’ end (Extended Data Fig. 9a, 1), we infer that 
eIF4A/RocA bound to these motifs blocks 43S scanning. We also found 
that e[F4A/RocA could inhibit translation from the polio virus inter- 
nal ribosome entry site (IRES), which bypasses ordinary 40S recruit- 
ment but still depends on scanning (Extended Data Fig. 9a)*° when we 
inserted polypurine motifs in the scanned region (Fig. 4c and Extended 
Data Fig. 9m). Scanning inhibition suffices to explain repression by the 
eIF4A/RocA complex, although our data do not exclude an additional 
effect on 40S loading. 

Impediments to 43S scanning by stable hairpins”® or RNA-binding 
proteins?’ can enhance the translation from upstream open reading 
frames (uORFs) that otherwise would be skipped. We observed that 
RocA treatment, but not Hipp treatment, caused an analogous accumu- 
lation of translation on 5’ UTRs despite the global reduction in footprints 
on protein-coding sequences (CDSes) (Extended Data Fig. 10a, b). 
This enhancement occurred specifically on high-sensitivity transcripts 
(Fig. 4d and Extended Data Fig. 10c). The uORFs activated by RocA 
showed enrichment of a polypurine motif 20-30 nt downstream of the 
uORF initiation codon (Fig. 4d, inset), reflecting the distance between 
the start site and the leading edge of the scanning complex”’. We 
tested directly whether eI[F4A/RocA complexes on polypurine motifs 
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can drive cryptic upstream initiation using a reporter mRNA with 
two alternative start sites that yield distinguishable protein isoforms. 
Insertion of a polypurine motif 30 nt downstream of the earlier AUG 
increased translation initiation from this codon upon RocA treatment 
(Fig. 4e), confirming that clamped eIF4A/RocA complexes on polypu- 
rine motifs drive upstream translation initiation. We found evidence 
that that this enhanced upstream initiation could contribute to eIF4A/ 
RocA-mediated repression of downstream CDSes”%, as RocA-sensitive 
transcripts showed more pre-existing uORF initiation’’ (Extended Data 
Fig. 10d, e). 

We have shown that RocA induces ATP-independent clamping of 
eIF4A onto polypurine sequences, creating an inhibitory roadblock for 
the scanning ribosome (Fig. 4f). Our identification of the eIF4A/RocA 
binding motif provides the first observation of a drug that stabilizes 
sequence-selective RNA-protein interactions*°. RocA may bind near 
the RNA interface on the N-terminal domain of eIF4A%, raising the 
possibility that the drug directly contacts purine bases of target RNAs. 
Alternatively, RocA might induce a conformational change leading 
to direct or indirect recognition of the polypurine motif by protein 
residues. Future structural insight into this polypurine selectivity may 
enable rocaglate derivatives with altered base selectivities that target 
different mRNA. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized. The investigators were not blinded to allocation during 
experiments and outcome assessment. 

General methods. HEK 293 Flp-In T-Rex cells (Invitrogen) were cultured 
and recombined according to manufacturer’s instructions. Stable integrants of 
SBP-tagged eIF4A were produced by co-transfection of these plasmids along with 
pOG44 by X-tremeGENE 9 (Roche) and selection using Hygromycin B. RocA, 
PP242, and thapsigargin were purchased from Sigma. Proteins and DNAs/RNAs 
were stained with GelCode Blue Stain Reagent (Thermo Scientific) and SYBR Gold 
Nucleic Acid Gel Stain (Invitrogen), respectively. 

Ribosome profiling. Library preparation and data analysis were performed 
according to the method previously described, which monitors mitochondrial 
ribosomes as well®*!. DMSO, RocA, Hipp, and PP242 were added to medium 
30 min before cell lysis. The libraries were sequenced on a HiSeq 2000/2500 
(Illumina). 

RIP-seq. Cells with tetracycline-inducible, SBP-tagged eIF4A integrated stably 
were plated in a 10cm dish and cultured for 3 days with 1j1gml"! tetracycline, 
incubated with DMSO, 0.03 1M, or 0.311M RocA for 30 min, washed once with 
5 ml of ice-cold PBS, lysed with 60011 of lysis buffer (20 mM Tris-HCl pH 7.4, 
150mM NaCl, 5mM MgCh, and 1mM DTT) containing 1% Triton X-100 and 
Turbo DNase I (Invitrogen) 25 U ml’, and then clarified by centrifugation for 
10 min at 20,000g, 4°C. The supernatant was incubated with 60 1l of Dynabeads 
M-270 Streptavidin (Invitrogen) equilibrated with lysis buffer containing 1% 
Triton X-100 at 4°C for 30 min. The beads were washed five times with lysis buffer 
containing 1% Triton X-100 and 1 M NaCl. SBP-eIF4A and bound RNAs were 
eluted with 25 11 of lysis buffer containing 5mM biotin at 4°C for 30 min. All 
buffers contained 0.001% DMSO with or without 0.03 or 0.3,1M RocA. RNAs 
were extracted with QIAzol (Qiagen) using the Direct-zol RNA miniprep (Zymo 
Research). One-third of eluted RNA (~100 ng) was mixed with 1 ng of in vitro 
transcribed, spike-in Renilla luciferase RNA (hRluc) (see ‘DNA constructs’) and 
sequencing libraries were prepared using Tru-seq Ribo-zero gold kit (Illumina). 
Libraries were sequenced on HiSeq2000/2500 (Illumina) sequencers. 

iCLIP. Cells were cultured as described in ‘RIP-seq. After medium was substituted 
with ice-cold PBS, the dishes on ice were irradiated with 150 mJ cm~* with UV-C 
(~254nm) in UVP CL-1000 (UVP). Lysate was prepared as described in ‘RIP-seq. 
The lysate from a 10cm dish (600,11) was treated with 0.4 U of RNase I (Epicentre) 
at 37°C for 3 min. Reaction was quenched by the addition of 1011 of SUPERase In 
RNase Inhibitor (Invitrogen), and then incubated with 60 1l of Dynabeads M-270 
Streptavidin (Invitrogen) equilibrated with lysis buffer containing 1% Triton X-100 
at 4°C for 30 min. The beads were washed by CLIP wash buffer (20 mM Tris-Cl 
pH 7.4, 1M NaCl, 2mM EDTA, 1 mM DTT, and 1% Triton X-100) twice, by CLIP 
wash buffer containing 0.1% SDS and 0.05% sodium deocycholate twice, and then 
by lysis buffer containing 1% Triton X-100 twice. After discarding the superna- 
tant, the beads were incubated with 10 U T4 PNK (NEB), 1 x PNK buffer, and 
0.33 .M *?P-+[ATP] (3,000 Ci mmol”! PerkinElmer) in 10,1] at 37°C for 5min 
and washed once with lysis buffer containing 1% Triton X-100. RNA-crosslinked 
proteins were eluted by 2011 of lysis buffer containing 1% Triton X-100 and 
5mM biotin at 37°C for 5 min, run onto NuPAGE (Invitrogen), and transferred 
to nitrocellulose membrane 0.45 1m (Biorad). The images of **P-labelled RNA- 
protein complex on the membrane were acquired by Typhoon TRIO (Amersham 
Biosciences). The membrane with the region containing SBP-eI[F4A/RNA com- 
plexes was excised and treated with 0.1 gl! Proteinase K (Thermo Scientific), 
200 mM Tris-Cl, pH 7.4, 25mM EDTA, pH 8.0, 300 mM NaCl, and 2% SDS in 
200 sl at 55°C for 20 min. RNAs were isolated by phenol/chloroform extrac- 
tion and ethanol precipitation. Library preparation was performed according to 
the method previously described?! with the following modifications. As linker 
DNA, 5/-(Phos) NNNNNIIIIITGATCGGAAGAGCACACGTCTGAA(ddC)-3’, 
where (Phos) indicated 5’ phosphoryaltion and (ddC) indicates a terminal 2’, 
3'-dideoxycytidine, was used. The capital letter Ns indicate random barcode 
and the letter Is indicate a sample mulplexing barcode. For multiplexing, linker 
DNAs containing ATCGT for DMSO replicate number 1, AGCTA for DMSO 
replicate number 2, CGTAA for RocA 0.03 1M, CTAGA for RocA 0.3 1M, and 
GATCA for RocA 3M in I positions were used, respectively. The linker DNAs 
were pre-adenylated by 5’ DNA adenylation kit (NEB) before the ligation reaction. 
Instead of gel extraction, unreacted linkers were removed by the treatment of the 
ligation reaction with 5’ deadenylase (NEB) and RecJ exonuclease (Epicentre) 
at 30°C for 45 min. Reverse transcription was performed with an oligonucleo- 
tide 5’-(Phos)NNAGATCGGAAGAGCGTCGTGTAGGGAAAGAG(iSp18) 
GTGACTGGAGTTCAGACGTGTGCTC-3’, where (Phos) indicates 5’ phos- 
phorylation and Ns indicate random barcode. PCR was performed with oligo- 
nucleotides, 5’-AATGATACGGCGACCACCGAGATCTACACTCTTTCCC 


TACACGACGCTC-3’ and 5’-CAAGCAGAAGACGGCATACGAGATCGT 
GATGTGACTGGAGTTCAGACGTGTG-3’. Libraries were sequenced on 
HiSeq4000 (Illumina) sequencers. Random barcode was used to eliminate PCR 
duplicates in the library. 

Bind-n-Seq. SBP-tagged eIF4A was purified as described in ‘RIP-seq; without 
DMSO or RocA treatment. The beads tethering SBP-eIF4A were treated with 
1x Micrococcal Nuclease Buffer (NEB), 0.5 lysis buffer, 0.5% Triton X-100, 
and 200 Ul! Micrococcal Nuclease (NEB) in 30,11 at 25°C for 30 min, washed 
five times with lysis buffer containing 1% Triton X-100, 1 M NaCl, and 5mM 
EGTA pH 7.4, and rinsed twice with lysis buffer containing 0.1% Triton X-100. 
The beads were incubated in lysis buffer containing 0.1% Triton X-100, 2mM 
AMP-PNP, 0.33 Uyl~! SUPERase In RNase Inhibitor (Invitrogen), 1 1M N30 
RNA ((N)39 CTGTAGGCACCATCAAT, where letters in bold type represent 
DNA sequence for reverse transcription primer hybridization) in 3011 at 37°C 
for 30 min, and washed five times with lysis buffer containing 0.1% Triton X-100, 
2mM AMP-PNBP, and 0.1% DMSO. SBP-eIF4A/RNA complex was eluted with 30 il 
of lysis buffer containing 0.1% Triton X-100, 2mM AMP-PNP, and 5mM biotin. 
DMSO (0.1%) with or without 30 or 300 nM RocA was present in all buffers during 
the RNA binding reaction, wash, and elution. RNAs were extracted with QIAzol 
(Qiagen) using the Direct-zol RNA miniprep (Zymo Research) and converted 
into DNA library as the same method of ribosome profiling*". For Bind-n-Seq 
with ADP + Pi, 2mM ADP, 2mM NajHPO,g, 501M N39 RNA, and 311M RocA 
were used. 

The 30-nt randomized RNA followed by 3’ DNA sequence for reverse transcrip- 
tion priming was designed to avoid ligation biases and sequencing of contaminat- 
ing RNA fragments from cells during SBP-eIF4A purification, and to cover the 
entire sequence with a single 50-bp mode of HiSeq (Illumina) sequencers. 

Our read depth (~108 reads) is less than the theoretical complexity (4° ~ 10"), 
so that the probability that the same sequence appears multiple times in the library 
is quite low. Therefore, we assumed that reads with exactly the same sequence and 
length in the library reflect PCR duplicates and counted them only once. Motif 
enrichment in the range of interest (4-6 nt) was calculated as the ratio of the motif 
frequency between libraries!®, 

Spearman's correlation of motif number in 5’ UTR versus RIP-seq -fold change 

caused by RocA treatment was used as motif prediction in RIP-seq. High-scoring 
motifs were defined as those with enrichment of the prediction or the enrichment 
is >1.5 s.d. from the mean in RIP-seq and Bind-n-Seq, respectively. 
Data analysis. The reads were aligned to the hg19 human genome reference and 
the resulting aligned reads were mapped to University of California, Santa Cruz 
(UCSC) known reference genes, downloaded from the UCSC genome browser in 
July 2013. A UCSC bed file of known genes was used for the 5’ UTR analysis. For 
mitochondria footprints alignments, we used the RefSeq genes track correspond- 
ing to the mitochondrial chromosome (chrM), downloaded from UCSC genome 
browser. Specific A-site nucleotides were empirically estimated on the basis of the 
length of each footprint. The offsets were 14 for 26-29 nt and 15 for 30-31 nt. For 
mitochondria footprints, they were 9 for 26-27 nt, 11 for 28-29 nt, 12 for 30 nt, 
13 for 31 nt, and 18 for 32-34 nt. For mRNA fragments, we used offset 14. For 
measuring footprint density and mRNA fragments of RIP-seq between samples, 
we restricted our analysis to genes, which have at least 40 and 100 summed counts 
in each sample, respectively. For CDSs, the analysis only included the transcript 
positions beginning 15 codons following the start codon and stopping 5 codons 
preceding the stop codon. For 5’ UTRs, we included the transcript positions from 
the 5’ end of the mRNA until 5 codons preceding the start codon. DESeq** was 
used to calculate relative enrichment of genes in the library, including the mito- 
chondrial footprints and spike-in hRluc mRNA counts. The calculated -fold change 
was re-normalized to the value of the summed mitochondria footprints or the 
spike-in hRluc mRNA fragments. 

High-sensitivity messages were defined as transcripts with reduction more than 
twofold from the median, and with q value < 0.01, between 31M RocA-treated and 
untreated cells. Low-sensitivity transcripts are defined as same as high-sensitivity 
but with accumulation over twofold. 

For calculation of AG, RNALfold (ViennaRNA Package)? was run with -L30 -g 
options on 5’ UTR sequences from UCSC foldUtr5 table. The minimum AG along 
each 5’ UTR was used as a representative free energy value for each gene. 

The presence of G-quadruplexes was predicted with RNAfold (ViennaRNA 
Package). 

The Gini differences across 5’ UTRs were calculated using published data". 
Analysis was restricted to the mRNAs bearing 5’ UTRs having one or more reads 
on A/C positions on average. 

‘uORF translation intensity’ was calculated using published data”’. To incor- 
porate the number and intensity of each upstream initiation site in the 5’ UTR, 
we calculated the density of 5’ UTR footprints for each transcript as mentioned 
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above, as the great majority of these footprints derive from ribosomes trapped on 
first codons (Extended Data Fig. 10d). To normalize mRNA abundance in cells, we 
normalized the density by footprint counts from the CDS start codon region using 
the genomic position between start codon and 6 nt downstream. We restricted 
the analysis to transcripts with at least ten counts from CDS start codons and 
re-normalized the value to the median as one. 

Code availability. Scripts to run the analyses mentioned above are available upon 
request. 

Fluorescence polarization assay. Proteins (0-50 ,1.M) were incubated in 14.4mM 
HEPES-NaOH, 108 mM NaCl, 1mM MgCh, 0.36 mM TCEP, 14.4% glycerol, 0.1% 
DMSO, and 10nM 5’ FAM-labelled RNA with or without 501M RocA in 10, 
reaction for 30 min at 25°C. The experiments were performed with 1mM AMP- 
PNP (for AMP-PNP), 1mM ADP, 5mM BeClh, and 25 mM NaF (for ADP-BeF,), 
1mM ADP, 5mM AICh, and 25mM NaF (for ADP-AIF,), or 1mM ADP and 1mM 
NazHPO, (for ADP + Pi). For the condition without ATP analogue, MgCl, was 
omitted from the reaction. 

For competition assay, the complexes were preformed with 1 mM ATP or AMP- 
PNP, 11M eIF4A, 10nM FAM-labelled RNA, and 501M RocA and chased with 
100,1M non-labelled RNA. Because of the low affinity, 501M eIF4A was used 
with ATP and DMSO. 

Fluorescence polarization was measured using an Infinite F-200 PRO (TECAN). 

The dissociation constant (Kg) and half-life (t/2) were calculated with fitting to 
the Hill equation and one-phase exponential decay equation, respectively, by Igor 
Pro software (WaveMatrics). 
In vitro translation and toeprinting assay. In vitro translation was performed 
with nuclease-treated RRL system (Promega), according to the manufacturer’s 
instructions. Reporter mRNAs (50nM; see ‘DNA constructs’) was incubated in 
50% RRL with RocA (concentration shown in the figure legends) or 1% DMSO in 
10:1 at 30°C for 1h. For the detection of SBP, 2011 of the reaction was used with 
uORF + CAACAA or uORF + AGAGAG mRNAs and concentrated with 1011 of 
Dynabeads M-270 Streptavidin (Invitrogen). 

The toeprinting assay was performed as previously described**. The reaction 
was pre-incubated with RRL in the presence of 2mM GMP-PNP or m’GTP and 
311M RocA or 1% DMSO at 30°C for 5 min, and then incubated with 50nM 
mRNAs at 30°C for 5 min, followed by reverse transcription with 10 Upl-! 
ProtoScript II (NEB) with 250nM 5/ 6-FAM labelled primer (5’-6-FAM- 
ATGCAGAAAAATCACGGC-3’) at 30°C for 15min. Recombinant elF4A 
(101M) was used instead of RRL in 30 mM HEPES, pH 7.3, 100 mM KOAc, 
1mM Mg(OAc);, and 1mM DTT in the presence or absence of 10,1M RocA. 
Complementary DNA (cDNA) was purified by phenol extraction, concentrated 
using Oligo Clean & Concentrator (Zymo Research), and run with GeneScan 
600 LIZ Size Standard v2.0 (Life Technologies) on a 3730 DNA Analyzer (Life 
Technologies). Data were analysed by GeneMapper software (Life Technologies). 
For pre-formation of e[F4A/RocA complex on mRNA, 30 11 of the reaction was 
loaded on G-25 column equilibrated with 30 mM HEPES, pH 7.3, 100mM KOAc, 
1mM Mg(OAc),, and 1mM DTT to remove free RocA. The flow-through mRNA 
was used for in vitro translation at 20nM. 

Dideoxy-terminated sequencing of RNA was performed by reverse transcrip- 
tions using 0.125 mM individual dideoxy-NTP and 0.5 mM each deoxy-NTP with 
the same 5’ 6-FAM-labelled primer and ProtoScript II, according to the manufac- 
turer’s instructions. 

RNase I footprinting assay. Reporter RNA was incubated with recombinant eIF4A 
and RocA in 12,11 as described for the toeprinting assay. The reaction was treated 
with 111 of 0.001 Ul! RNase I (Epicentere) at room temperature for 5 min. After 
quenching the digestion by the addition of 1 1l of SUPERase In RNase Inhibitor 
(Invitrogen), RNA was extracted by Oligo Clean & Concentrator (Zymo Research) 
and reverse transcribed by ProtoScript II (NEB) with 5’ 6-EAM labelled primer 
(5'-6-FAM-ATGCAGAAAAATCACGGC-3’) according to the manufacturer's 
intructions. The cDNA was run on 3730 DNA Analyzer (Life Technologies) as 
described for the toeprinting assay. Data were analysed by GeneMapper software 
(Life Technologies). 

Polysome profiling. Cell lysate was prepared as described previously*!. Lysate 
containing 15j1g total RNA was loaded on to 10-50% linear sucrose gradients 
containing 20 mM Tris-HCl, pH 7.4, 150mM NaCl, 5mM MgCh, 1mM DTT, 
100g ml"! cycloheximide, and 2 Uml! SUPERase In RNase Inhibitor and 
sedimented by ultracentrifugation at 220,000g for 2h at 4°C with an SW41 rotor 
(Beckman Coulter). Gradients were fractionated using Gradient station (Biocomp). 
Ultraviolet absorbance was detected by ECONO UV monitor (Biorad). 
Metabolic labelling of nascent peptide by OP-puro. Nascent peptides in HEK 
293 cells were labelled by 401M OP-puro (Jena Bioscience) in 24-well dishes with 
0-3 1M RocA for 30 min. Cells were washed with PBS and lysed with 50 1 of lysis 
buffer (20 mM Tris-HCl, pH 7.4, 150 mM NaCl, 5mM MgCh, and 1mM DTT) 
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containing 1% Triton X-100, and then clarified by centrifugation with 20,000g at 
4°C for 10 min. Nascent peptides were labelled with 5|1M Alexa Fluor 488 Azide 
(ThermoFisher Scientific) by a Click-it Cell Reaction Buffer Kit (ThermoFisher 
Scientific) according to the manufacturer’s instructions and run on SDS-PAGE. 
Images were acquired by FluorChem R imaging sysmtem (ProteinSimple) and 
quantified by AlphaView (ProteinSimple). 

DNA constructs. DNA fragments containing 5’ UTRs sequences, listed below, 
were inserted between T7 promoter and ORF of Renilla luciferase (hRluc) in 
psiCHECK2 (Promega). We cloned exactly the same sequence of G-quadruplex 
and its control sequence used in ref. 5. These plasmids were digested by NotI and 
used as in vitro transcription template. 

PTGES3 (uc001slu.4): GCCGCCCGGCCTCACCACCCCTCGTT 
TGCACGCACGCACGTTCATTCTCCGTCCTCGCGCCCCTTTTCCTACACT 
TTCCTCTTCTCCCCGACCGGAGGAGCCGCTCTTTCCGCGCGGTGCATTC 
TGGGGCCCGAGGTCGAGCCCGCCGCTGCCGCCGTCGCCTGAGGGAAG 
CGAGAAGAGGCCGCGACCGGAGAGAAAAAGCGGAGTCGCCACCGGAG 
AGAAGTCGACTCCCTAGCAGCAGCCGCCGCCAGAGAGGCCCGCCCAC 
CAGTTCGCCCGTCCCCCTGCCCCGTTCACA. 

EIF2S3 (uc004dbc.3): TTTCCTTCCTCTTTTGGCAAC. 

HNRNPC (uc001vzy.3): AGGAATGGGGCGGGGACTAGGCCTT 
CGCCTCGGCGGCAGAGGAGACTCGGGGGCCATTTTGTGAAGAGACGAA 
GACTGAGCGGTTGTGGCCGCGTTGCCGACCTCCAGCAGCAGTCGGCT 
TCTCTACGCAGAACCCGGGAGTAGGAGACTCAGAATCGAATCTCTTCT 
CCCTCCCCTTCTTGTGAGATTTTTTTGATCTTCAGCTACATTTTCGGCT 
TTGTGAGAAACCT TACCATCAAACACG. 

GPX1 (uc021wxw.1): CAGTTAAAAGGAGGCGCCTGCTGGCCTCCCCTTA 
CAGTGCTTGTTCGGGGCGCTCCGCTGGCTTCTTGGACAATTGCGCC. 

TMA7 (uc003cte.1): GGGGAAGCGGCGGCAGGCGCC. 

KMTZ2A (uc001pta.3): CTGCTTCACTTCACGGGGCGAAC. 

HCV IRES: GCCAGCCCCCTGATGGGGGCGACACTCCACCATGAATC 
ACTCCCCTGTGAGGAACTACTGTCTTCACGCAGAAAGCGTCTAGCCATG 
GCGTTAGTATGAGTGTCGTGCAGCCTCCAGGACCCCCCCTCCCGGGAG 
AGCCATAGTGGTCTGCGGAACCGGTGAGTACACCGGAATTGCCAGGAC 
GACCGGGTCCTTTCTTGGAGT TACCCGCTCAATGCCTGGAGATTTGGG 
CGTGCCCCCGCAAGACTGCTAGCCGAGTAGTGT TGGGTCGCGAAAGGC 
CTTGTGGTACTGCCTGATAGGGTGCTTGCGAGTGCCCCGGGAGGTCTC 
GTAGACCGTGCACCATGAGCACGAATCCTAAACCTCAAAGAAAAACCA 
AACGTAAC. 

G-quadruplex: CTAGGTTGAAAGTACTTTGACGGCGGCGGCGGTCAA 
TCTTACGGCGGCGGCGGACATAGATACGGCGGCGGCGGTAGAAACTA 
CGGCGGCGGCGGATTAGAATAGTAAA (where letters in bold type represent 
G-quadruplex-forming sequences). 

Randomized control for G-quadruplex: CTAGGGCGCACGTACTT 
CGACAACGTCAGCGTTCAGCGTTCCAACGTCAGCGTACAGCGATCCAA 
CGTCAGCGTTCTGCGCTACAACGTCAGCGTATCCGCGTAGCACA. 

CAA repeat: GAACAACAACAACAACAACAACAACAACAACAAC 
AACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAA 
CACC. 

7x AGAGAG motifs: GAAAGAGAGCAACAAAGAGAGCAACAAAGAG 
AGCAACAAAGAGAGCAACAAAGAGAGCAACAAAGAGAGCAACAAAGA 
GAGCACC. 

1x AGAGAG left: GAAAGAGAGCAACAACAACAACAACAACAA 
CAACAACAACAACAACAACAACAACAACAACAACAACAACAACAACAA 
CAACACC. 

1x AGAGAG middle: GAACAACAACAACAACAACAACAACAACAACAA 
CAACAAAGAGAGCAACAACAACAACAACAACAACAACAACAACAACAA 
CACC. 

1x AGAGAG right: GAACAACAACAACAACAACAACAACAACAACAA 
CAACAACAACAACAACAACAACAACAACAACAACAACAACAAAGAGA 
GCACC. 

Polio virus IRES: TTAAAACAGCTCTGGGGTTGTACCCACCCCAG 
AGGCCCACGTGGCGGCTAGTACTCCGGTATTGCGGTACCCTTGTACGC 
CTGTTTTATACTCCCTTCCCGTAACT TAGACGCACAAAACCAAGTTCAA 
TAGAAGGGGGTACAAACCAGTACCACCACGAACAAGCACTTCTGTTTC 
CCCGGTGATGTCGTATAGACTGCTTGCGTGGTTGAAAGCGACGGATCCG 
TTATCCGCTTATGTACTTCGAGAAGCCCAGTACCACCTCGGAATCTTCG 
ATGCGTTGCGCTCAGCACTCAACCCCAGAGTGTAGCT TAGGCTGATGAG 
TCTGGACATCCCTCACCGGTGACGGTGGTCCAGGCTGCGTTGGCGGCC 
TACCTATGGCTAACGCCATGGGACGCTAGTTGTGAACAAGGTGTGAAG 
AGCCTATTGAGCTACATAAGAATCCTCCGGCCCCTGAATGCGGCTAATC 
CCAACCTCGGAGCAGGTGGTCACAAACCAGTGATTGGCCTGTCGTAAC 
GCGCAAGTCCGTGGCGGAACCGACTACTTTGGGTGTCCGTGTTTCCTT 
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TTATTTTATTGTGGCTGCTTATGGTGACAATCACAGATTGTTATCATAAA 
GCGAATTGGATTGGCCATCCGGTGAAAGTGAGACTCATTATCTATCTG 
TTTGCTGGATCCGCTCCATTGAGTGTGTTTACTCTAAGTACAATTTCAAC 
AGTTATTTCAATCAGACAATTGTATCATA. 

Polio virus IRES 3x AGAGAG: TTAAAACAGCTCTGGGGTTGTA 
CCCACCCCAGAGGCCCACGTGGCGGCTAGTACTCCGGTATTGCGGTACC 
CTTGTACGCCTGTTTTATACTCCCTTCCCGTAACT TAGACGCACAAAACC 
AAGTTCAATAGAAGGGGGTACAAACCAGTACCACCACGAACAAGCACTT 
CTGTTTCCCCGGTGATGTCGTATAGACTGCTTGCGTGGTTGAAAGCGA 
CGGATCCGTTATCCGCT TATGTACTTCGAGAAGCCCAGTACCACCTCGG 
AATCTTCGATGCGTTGCGCTCAGCACTCAACCCCAGAGTGTAGCT TAG 
GCTGATGAGTCTGGACATCCCTCACCGGTGACGGTGGTCCAGGCTGCG 
TTGGCGGCCTACCTATGGCTAACGCCATGGGACGCTAGTTGTGAACAA 
GGTGTGAAGAGCCTATTGAGCTACATAAGAATCCTCCGGCCCCTGAAT 
GCGGCTAATCCCAACCTCGGAGCAGGTGGTCACAAACCAGTGATTGG 
CCTGTCGTAACGCGCAAGTCCGTGGCGGAACCGACTACTTTGGGTGTC 
CGTGTTTCCTTTTATTTTATTGTGGCTGCTTATGGTGACAATCACAGATT 
GTTATCATAAAGCGAATTGGATTGGCCATCCGGTGAAAGTGAGACTCAT 
TATCTATCTGTTTGCTGGATCCGCTCCATTGAGAGAGTT TACTCTAAGT 
AGAGAGTCAACAGTTATTAGAGAGAGACAATTGTATCATA. 

The following DNA fragments, coding Drosophila msl-2 5‘ UTR and SBP, were 
amplified by PCR and used for in vitro transcription template. 

uORF + CAACAA: TAATACGACTCACTATAGGGCAGCATAACCATTG 
TTGATGACTCGAGACCTCTCAAACGTAAACCAACAACAAGCACGTGACA 
CCATGGACGAGAAAACCACCGGCTGGCGGGGAGGCCACGTGGTGGAA 
GGGCTGGCAGGCGAGCTGGAACAGCTGCGGGCCAGACTGGAACACCA 
CCCCCAGGGCCAGAGAGAGCCTAGCGGCGGAGGAGACTACAAAGACC 
ATGACGGTGATTATAAAGATCATGACATCGATTACAAGGATGACGATGA 
CAAGTGATTCTAGGCGATCGCTCGAGCCCGGGAATTCGTTTAAACCTA 
GAGCGGCC. 

uORF + AGAGAG: TAATACGACTCACTATAGGGCAGCATAACCATTGTT 
GATGACTCGAGACCTCTCAAACGTAAACCAAAGAGAGGCACGTGACAC 
CATGGACGAGAAAACCACCGGCTGGCGGGGAGGCCACGTGGTGGAAG 
GGCTGGCAGGCGAGCTGGAACAGCTGCGGGCCAGACTGGAACACCACC 
CCCAGGGCCAGAGAGAGCCTAGCGGCGGAGGAGACTACAAAGACCAT 
GACGGTGATTATAAAGATCATGACATCGAT TACAAGGATGACGATGACA 
AGTGATTCTAGGCGATCGCTCGAGCCCGGGAATTCGTTTAAACCTAGA 
GCGGCC. 

Reporter RNAs were in vitro transcribed, capped, and polyadenylated 
using a T7-Scribe Standard RNA IVT Kit, a ScriptCap m’G Capping System, a 
ScriptCap 2’-O-Methyltransferase Kit, and A-Plus Poly(A) Polymerase Tailing Kit 
(CELLSCRIPT). The capping reaction was skipped for polio virus IRES and polio 
virus IRES 3x AGAGAG reporters. 

For the generation of stable cell-lines, PCR products containing CDS regions 
of EIF4AI mRNA and SBP amplified from cDNA from human adult normal 
brain (Invitrogen) and from pASW* (a gift from Y. Tomari), respectively, 
were inserted into HindIII site in pcDNA5/FRT/TO (Invitrogen) by Gibson 
assembly (NEB). P159Q, F163L, and Q195E mutations were introduced by site- 
directed mutagenesis. 

For recombinant eIF4A protein expression, PCR products containing CDS 

regions of EIF4AI mRNA were inserted into pHM-GWA™ to construct pHisMBP- 
eIF4A. VX4GKT (A82V) and D296A-T298K mutations were introduced by site- 
directed mutagenesis. His-tag, MBP-tag, tobacco etch virus protease cleavage site, 
and the N-terminal region of eIF4A (1-237) were cloned into pET-28a, to construct 
pHisMBP-eIF4A (1-237). 
Reporter assay in HEK 293 cells. Transfections were performed in 24-well dishes 
with a TransIT-mRNA Transfection Kit (Mirus) according to the manufacturer's 
instructions, at half scale. Three hours after transfection, RocA was added to the 
medium, and 9h after transfection cells were washed with PBS and lysed with 
Passive lysis buffer (Promega). The luciferase assay was performed with Renilla-Glo 
Luciferase Assay System (Promega) according to the manufacturer's instructions. 
Luminescence was detected with a GloMax-Multi Jr System (Progema). 

For stable cell lines with SBP-tagged eIF4A and its mutants, HEK 293 Flp-In 
T-Rex cells were cultured for 4 days with 1,.g ml"! tetracycline before the experi- 
ments. Tetracycline was included in media during experiments. 

Quantitative PCR. Cell lysate or in vitro translation reaction for luciferase assay 
was treated with 40 U ml"! TurboDNase for 10 min on ice, and then RNA was 
extracted by TRI Reagent (Sigma) and Direct-zol RNA MiniPrep (Zymo Research). 
Reverse transcriptions were performed with ProtoScript II (NEB) and random 


primer mix (NEB) according to the manufacturer’s instructions. Quantitative 
PCR (qPCR) was performed with Fast EvaGreen qPCR Mix (Biotium) in BioRad 
CFX96 Touch Real Time PCR Detection System (Bio-Rad) with oligonucleotides, 
5'-TCGTCCATGCTGAGAGTGTC-3’, and 5/-CTAACCTCGCCCTTCTCCTT-3’. 
RNA from non-transfected cells or in vitro translation reaction without the addi- 
tion of mRNAs was used as qPCR background. 

Purification of recombinant eIlF4A proteins. Typically, BL21 Star (DE3) 
Escherichia coli cells (Invitrogen) transformed with pHisMBP-elF4A, pHisMBP- 
eIF4A (VX4GKT), pHisMBP-eIF4A (D296A-T298K), or pHisMBP-elF4A (1-237) 
in 1.5 L culture were cultivated to an absorbance at 600nm, A¢oo nm, Of 0.5 at 37 °C 
with 50g ml“! kanamycin and then grown at 16°C overnight with 1mM IPTG. 
The cell pellets were resuspended in His buffer (20 mM HEPES-NaOH, pH 7.5, 
500 mM NaCl, 10 mM imidazole, 10 mM 3-mercaptoethanol) with 0.5% NP-40, 
sonicated, and centrifuged at 35,000g for 20 min. The supernatant was incubated 
with 1.5 ml bed volume of Ni-NTA Superflow (Qiagen) for 1h. The beads were 
loaded on a gravity column and washed with His buffer containing 1 M NaCl. The 
proteins were eluted with 50 mM Na-phosphate buffer, pH 7.5, 500 mM NaCl, 
100 mM Na2SOq4, 250 mM imidazole, 10 mM 8-mercaptoethanol, treated with 
tobacco etch virus protease overnight, dialysed to 20 mM HEPES-NaOH, pH 7.0, 
150mM NaCl, 0.5mM TCEP, and 10% glycerol, and loaded on MBPTrap HP 5 ml 
(GE Healthcare). The flow-through fractions were collected, concentrated with 
Amicon Ultra 10kDa (Millipore), and loaded onto a HiLoad 16/600 Superdex 
75 prep grade column (GE Healthcare) equilibrated with 20 mM HEPES-NaOH, 
pH7.5, 150mM NaCl, 0.5mM TCEP. The peak fractions were collected, concen- 
trated with Amicon Ultra 10kDa (Millipore), mixed with 0.25 volumes of 80% 
glycerol, shock-frozen in liquid nitrogen, and stored at —80°C. All purification 
steps were performed at 4°C. Column chromatography was performed using an 
AKTA purifier (GE Healthcare). 

Pulldown assay. The lysate of E. coli cells expressing eI[F4A wild type (WT) or 
eIF4A D296A-T298K proteins from 1 ml culture was prepared as described in 
‘Purification of recombinant eIF4A proteins’ and incubated with 10,11 of HisPur 
Ni-NTA Magnetic Beads (Thermo Scientific) at 4°C for 30 min. The beads were 
washed five times with His buffer containing 1 M NaCl, rinsed once with 20 mM 
HEPES-NaOH, pH 7.5, 10mM NaCl, 10mM imidazole, 10 mM (-mercaptoethanol, 
and incubated with RRL (Promega) at 25°C for 30 min. After five washes with His 
buffer, the proteins were eluted from the beads by SDS sample buffer. 

ATP crosslinking assay. Recombinant elF4A WT and VX4GKT (101M) was incu- 
bated with 11M [)-*2P]-ATP (3,000 Ci mmol |, Perkin Elmer) in 30 mM Hepes- 
KOH, pH 7.3 (Fisher Scientific), 100 mM KOAc, 5mM Mg(OAc),, and 1mM DTT 
in 20 reaction at 37°C for 15 min. The reactions were exposed to 1500 mJ cm~? 
using UV 254 nm (CL-1000, UVP) at a distance of 2cm from the lamp on ice and 
run on SDS-PAGE. The images were acquired by Typhoon TRIO (Amersham 
Biosciences). 

Western blotting. Anti-eIF4AI (2490, Cell signaling) (1:1,000), anti- 
phospho-elF2«a (Ser51) (D9G8 3398, Cell Signaling) (1:1,000), anti-4E-BP1 
(9452, Cell Signaling) (1:2,000), anti-phospho-4EBP (Thr37/46) (236B4 2855, 
Cell Signaling) (1:2,000), anti-3-actin (ab20272, Abcam) (1:1,000), anti-eIF4E 
(9742, Cell Signaling) (1:1,000), anti-eIF4G (2498, Cell Signaling) (1:1,000), and 
anti-SBP-tag (SB19-C4 sc-101595, Santa Cruz Biotechnology) (1:1,000) were used 
as primary antibodies. Chemiluminescence was induced by Pierce ECL Western 
Blotting Substrate (Thermo Scientific) and images were acquired by a FluorChem 
R imaging system (ProteinSimple). 
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Extended Data Figure 1 | RocA represses translation, targeting footprint reads to 13 mitochondrial mRNAs among different conditions (e) 
to eIF4A. a, Polysome profiling experiments with RocA and PP242 and correlation of sum of the footprint reads from cytoplasmic ribosomes 
treatments. RocA disrupts polysomes dose-dependently. b, Western to each transcript between biological replicates (f). Symbol r is Pearson's 
blot of phospho-eIF2a and phospho-4EBP shows that effect of correlation. P value is calculated by Student’s t-test. g, h, Tile plot of 
RocA is independent of known translation control targeting to eIFs. codon periodicity along length of mitochondria footprints (g, left) 
Phosphorylation of eI[F2a and dephosphorylation of 4EBP were induced and mitochondria footprint length distribution (g, right) and codon 
by thapsigargin and PP242, respectively. c, d, Luciferase reporter assay periodicities of 31-nt mitochondrial footprints among different conditions (h). 
possessing PTGES3 5’ UTR (Fig. 1c) with exogenous expression of WT Footprints with 31-nt length showed most homogenous codon periodicity, 
or RocA-resistant eIF4A mutants (c) and western blot of endogenous and this periodicity was retained with RocA treatment, showing that 
and exogenous elF4A (d). eIF4A is the main molecular target of RocA. mitochondrial ribosome translates even in high doses of RocA. 


Data represent mean and s.d. (n= 3). e, f, Correlation of sum of the 
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Extended Data Figure 2 | RocA represses translation without mRNA 
degradation. a, Metabolic labelling of nascent peptides with OP-puro. The 
OP-puro incorporated nascent peptides were visualized by Click reaction 
with Alexa Fluor 488 Azide (middle) and quantified (right). Data represent 
mean and s.d. (n= 3). b, Correlation of translation -fold change among 
different concentrations of RocA treatments. c, MA plot of mean footprint 
reads between 0.03 1M RocA treatment and non-treatment normalized to 


library sizes to footprints -fold change by 0.03 1M RocA treatment (left) 
and the correlation of translation -fold change between 0.03 and 31M of 
RocA treatments (right), highlighting high-sensitivity mRNAs at 0.03 1M 
RocA treatment. d, Scatter plots of footprint -fold change normalized to 
mitochondrial footprints and mRNA -fold change by RocA treatments. 
RocA represses translation without significatnt mRNA change. e, qPCR 
from the samples of Fig. 1c. Data represent mean and s.d. (n= 3). 
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Extended Data Figure 3 | Secondary structure in 5’ UTR is not strong 
determinant of RocA sensitivity. a, Cumulative fractions along length 
of 5’ UTR, minimum AG among all 30-mer windows along a 5’ UTR, 
AG in cap-proximal region (30 nt) of 5’ UTR, and Gini difference are 


plotted to total, RocA high-sensitivity, and RocA low-sensitivity mRNAs. 


Significance is calculated by Mann-Whitney U-test. b, Cumulative 
fractions along translation -fold change by RocA are plotted to total 


mRNAs and mRNAs with predicted G-quadruplexes in 5’ UTRs. 
Significance is calculated by Mann-Whitney U-test. The impact of 
presence of G-quadruplex in 5’ UTR is modest in RocA sensitivity. c, The 
5/ UTRs with G-quadruplexes and randomized control sequence were 
fused to Renilla luciferase and these reporter mRNAs were transfected 
before treatment with RocA as indicated. Data represent mean and s.d. 
(n=3). G-quadruplex does not show the prominent RocA sensitivity. 
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Extended Data Figure 4 | Characterization of translational inhibition 
by Hippuristanol and PP242. a, Polysome profiling experiments 

with Hipp treatments. Hipp disrupts polysomes dose-dependently. 

b, Histograms of number of transcripts along footprints -fold change with 
0.01 and 141M Hipp treatment compared with non-treatment, normalized 
to mitochondrial footprints. Median -fold change is shown. Bin width is 
0.1. c, MA plot of mean footprint reads between 11M Hipp treatment and 
non-treatment normalized to library sizes to translation -fold change by 
11M Hipp treatment, highlighting high-sensitivity and low-sensitivity 
mRNAs. d, Cumulative fractions along length of 5’ UTR, minimum AG 


among all 30-mer windows along a 5’ UTR, AG in cap-proximal region 
(30 nt) of 5’ UTR, and Gini difference are plotted to total, Hipp high- 
sensitivity, and Hipp low-sensitivity mRNAs. Significance is calculated by 
Mann-Whitney U-test. e, Translation -fold changes by RocA and Hipp are 
modestly correlated. f, MA plot of mean footprint reads between 2.5 1M 
PP242 treatment and non-treatment normalized to library sizes to 
translation -fold change by PP242 treatment, highlighting PP242 target 
mRNAs. g, Cumulative distributions of translation -fold change caused by 
RocA and Hipp treatment are plotted for total and PP242-target mRNAs. 
Significance is calculated by Mann-Whitney U-test. 
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Extended Data Figure 5 | Purification of SBP-tagged eIF4A and 
co-purified RNA from HEK 293 cells. a, Western blot of exogenous 
SBP-eIF4A and endogenous eIF4A in tetracycline-inducible stable cell 
line. Expression of physiological levels of the tagged allele attenuated 
endogenous eIF4A expression but preserved overall elF4A levels, 
probably reflecting the same feedback loop previously reported between 
elF4AI and eIF4AII°*”. b, CBB staining of purified SBP-eIF4A and SYBR 
Gold staining of purified RNA bound to SBP-eIF4A with or without 
micrococcal nuclease (MNase). c, Correlation of sum of the mRNA 


Translation fold change normalized to mitochondrial footprints [log2] 


fragment reads of each transcript between biological replicates of RIP-seq. 
P value is calculated by Student's t-test. d, Histogram of the number of 
transcripts along RNA/eIF4A interaction -fold change by RIP-seq when 
cells are treated with 0.03 or 0.31M RocA normalized to spiked-in RNA. 
Data present the same mRNAs analysed in Fig. la. Median -fold change 

is shown. Bin width is 0.1. e, Correlation of RIP -fold change between 
different concentration of RocA treatments. f, Correlation of translation 
-fold change to RIP -fold change with the same concentration of RocA 
treatment. 
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Extended Data Figure 6 | Motif enrichment by Bind-n-Seq. 

a, Nucleotide composition in each length of reads in input RNAs for 
Bind-n-Seq. Input RNAs are random in entire read length. b, Length 
distribution of reads from Bind-n-Seq. RNAs bound to eIF4A showed 
longer length distribution, indicating that eIF4A has preference for longer 
RNAs. c, Correlations of tetramer motif enrichment in Bind-n-Seq by 

0.03 1M RocA treatment to that by 0.3 1M RocA treatment. d, Correlations 
between pentamer and hexamer motif enrichment in Bind-n-Seq by 0.03 1M 
RocA treatment and motif prediction of 0.03 1M RocA effect in RIP-seq. 

e, Highest-scoring pentamer and hexamer motifs in Bind-n-Seq and 
RIP-seq. f, Cumulative fractions along number of tetramer motifs (Fig. 2b) 
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in 5’ UTR are plotted to total, RocA high-sensitivity, and RocA low- 
sensitivity mRNAs. Significance is calculated by Mann-Whitney U-test. 
g, Correlations of Bind-n-Seq motif enrichment (pentamer) by eIF4A to 
that by 0.03 1M RocA treatment. The motifs appearing in RNAs used in 
Extended Data Fig. 8 are highlighted. h, Correlation of Bind-n-Seq motif 
enrichment (pentamer) by elF4A to motif prediction of Hipp effect in 
translation change, which is defined as Spearman's correlation of motif 
number in 5’ UTR to translation -fold change by Hipp. mRNAs with 
high-affinity motif to eIF4A in 5’ UTR are resistant to Hipp treatment. 
i, The correlation between enriched motifs of replicates in Bind-n-Seq 
with ADP + Pi. 
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Extended Data Figure 7 | Characterization of iCLIP data. 

a, CBB staining of purified SBP-eIF4A protein in iCLIP procedure. 

b, Visualization of RNA-crosslinked with SBP-eIF4A and unknown 
proteins by **P labelling of RNA. We avoided the contamination of 
RNAs cross-linked to the additional, co-purifying, unknown proteins. 
c, Distribution of read length in iCLIP libraries. Avoidance of 
contaminating RNAs restricted us to short RNAs, which probably 


by 3 uM RocA [log2] 


correspond to the region of RNA physically protected by eIF4A binding, 
or footprint. d, Nucleotide bias along the reads in iCLIP libraries. The 
crosslinking bias for U may underestimate the preference for polypurine 
motifs. e, Correlations of iCLIP motif enrichment (tetramer) by different 
RocA concentrations. f, Correlations of iCLIP motif enrichment 
(tetramer) by 311M RocA and motif prediction of 0.03 11M RocA effect in 
RIP-seq. The motifs shown in Fig. 2a are highlighted. 
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Extended Data Figure 8 | e[F4A/RNA affinity measured by fluorescence 
polarization. a, CBB staining of recombinant proteins used in this study. 
b, Summary of Kg between RNA and eIF4A among the conditions assayed. 
c, e-g, i, Direct measurement of the eIF4A/RNA affinity by fluorescence 
polarization for eIF4A WT, eIF4A (VX4GKT), or eIF4A (D296A-T298K) 


and 5’ FAM-labelled RNAs in the presence or absence of RocA. Data 
represent mean and s.d. (n =3). d, ATP crosslinking assay with eIF4A WT 
and eIF4A (VX4GKT). h, Pulldown assay with His-MBP-eIF4A expressed 
in E. coli and eIF4E/G in RRL. 
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Extended Data Figure 9 | Characterization of toeprinting assay. 

a, Diagram of the reporters used in this study. b, ¢, In vitro translation 

in RRL with mRNAs containing seven polypurine motif (AGAGAG) 
insertions (b) and qPCR from the samples (c). d, Dideoxy-terminated 
sequencing of RNA by reverse transcription verified the toeprinting 
product length terminated by 48S ribosomes. e, Ribosome toeprinting 
assay performed in RRL in the presence of m7-GTP in the presence 

or absence of 31M RocA treatment. f, Toeprinting assay using 10 1M 
recombinant elF4A in the presence or absence of 101M RocA treatment. 
g, Toeprinting assay (top) and RNase I footprinting assay (bottom) 
using 10j1M recombinant eIF4A with mRNA containing one AGAGAG 
motif at the middle in the presence or absence of 101M RocA treatment. 


h, i, Toeprinting assay using 101M recombinant eIF4A (VX4GKT) or 
(D296A-T298K) with mRNA containing seven AGAGAG motifs in the 
presence or absence of 10j1M RocA treatment. j, Pre-formation of the 
complex with RocA and eIF4A (VX4GKT) or (D296A-T298K) on the 
mRNA bearing seven polypurine motifs represses the translation from 
the mRNA in RRL. k, Basal translation level from mRNA containing 
seven AGAGAG motifs with the supplementation of recombinant eIF4A. 
1, In vitro translation in RRL with mRNAs with a single polypurine motif 
(AGAGAG) insertion at the different positions in 5’ UTR. m, Basal 
translation level from mRNAs bearing polio virus IRES and polio virus 
IRES with three AGAGAG motifs. In b, c, and h-j, data represent mean 
and s.d. (n= 3). 
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Extended Data Figure 10 | The 5’ UTR footprints accumulated in RocA 
treatments come from uORFs. a, The distributions of specific footprint 
length, which is a hallmark of 80S ribosomes’, from CDS and 5’ UTR are 
indistinguishable. b, The change in ribosome footprint counts for 5’ UTRs 
and CDSs when cells are treated with 311M RocA or 11M Hipp compared 
with non-treatment, normalized to mitochondrial footprints. Median -fold 
change is shown. Bin width is 0.1. Analysis is restricted to mRNAs bearing 
footprints in the 5’ UTR in the non-treatment condition. c, Meta-gene 
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analysis of low-sensitivity transcripts to RocA. Reads are normalized to the 
sum of mitochondrial footprints reads. d, The illustration of the definition 
of uORF translation intensity. e, Transcripts sensitive to RocA contain 
more active uUORFs, as measured by cumulative distributions of the ORF 
translation intensity c. Significance is calculated by Mann-Whitney U-test. 
f, The summary of deep sequencing-based approaches used in this study 
and corresponding figures. 
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Molecular architecture of the human sperm IZUMOI1 
and egg JUNO fertilization complex 


Halil Aydin!, Azmiri Sultana', Sheng Li?, Annoj Thavalingam! & Jeffrey E. Lee! 


Fertilization is an essential biological process in sexual reproduction 
and comprises a series of molecular interactions between the sperm 
and egg’. The fusion of the haploid spermatozoon and oocyte is 
the culminating event in mammalian fertilization, enabling the 
creation of a new, genetically distinct diploid organism**. The 
merger of two gametes is achieved through a two-step mechanism 
in which the sperm protein IZUMO1 on the equatorial segment of 
the acrosome-reacted sperm recognizes its receptor, JUNO, on the 
egg surface*®. This recognition is followed by the fusion of the two 
plasma membranes. IZUMO1 and JUNO proteins are indispensable 
for fertilization, as constitutive knockdown of either protein 
results in mice that are healthy but infertile>®. Despite their central 
importance in reproductive medicine, the molecular architectures of 
these proteins and the details of their functional roles in fertilization 
are not known. Here we present the crystal structures of human 
IZUMO1 and JUNO in unbound and bound conformations. The 
human IZUMOLI structure exhibits a distinct boomerang shape and 
provides structural insights into the IZUMO family of proteins’. 
Human IZUMO1 forms a high-affinity complex with JUNO and 
undergoes a major conformational change within its N-terminal 
domain upon binding to the egg-surface receptor. Our results 
provide insights into the molecular basis of sperm-egg recognition, 
cross-species fertilization, and the barrier to polyspermy, thereby 
promising benefits for the rational development of non-hormonal 
contraceptives and fertility treatments for humans and other 
mammnals. 

The journey of a human sperm to an egg ends in the female 
oviduct, where the active sperm penetrates through the zona pellucida 
glycoprotein layer of the egg to reach the perivitelline space between 
the zona layer and the plasma membrane of the oocyte*"!°. The active 
sperm then fuses with the oocyte membrane to allow the formation of 
the zygote!. At least two membrane-bound proteins, sperm IZUMO1 
and egg JUNO, are essential for gamete recognition, fusion, or both>*. 
Both the IZUMO1 and JUNO (also known as IZUMO1R) genes are 
conserved in other mammals"! (Extended Data Figs 1, 2). 

Structural and biochemical studies of IZUMO1 are hampered by 
difficulties in recombinant protein expression'*. Using Drosophila 
melanogaster S2 cells, we expressed and purified the extracellular region 
of human IZUMO1 (residues 22-254) by Ni*+-affinity and gel filtration 
chromatography. Biophysical characterization of IZUMO1 revealed a 
stable and monomeric protein with extensive mixed a-8 secondary 
structural characteristics (Extended Data Fig. 3). We obtained crystals 
of unbound IZUMO 17-954 and determined its structure at 3.1 A reso- 
lution. IZUMO1 )_254 is a monomer and adopts a distinct boomerang 
shape with dimensions of around 85 A x 25A x 22 A. The overall 
structure consists of two domains: a rod-shaped N-terminal four-helix 
bundle (4HB; residues 22-134) and a C-terminal immunoglobulin-like 
(Ig-like; residues 167-254) domain (Fig. 1 and Supplementary Fig. 1). 
Two anti-parallel 3-strands (81 and 82) function like a hinge between 
the 4HB and Ig-like domains. 


The four helices in the IZUMO1 4HB domain (al, «2, «3 and «4) 
vary from 14 to 30 residues in length. The helices have amphipathic 
character with a polar surface exposed to solvent and hydrophobic 
residues packing into a core. Helices 0l-a2 and 03-4 are connected 
with short five-residue loops (L1 and L3), and a longer 15-residue 
loop (L2) links a2 to a3. The 4HB and hinge regions are stabilized 
by an extensive network of disulfide linkages (C22-C149, C25-C152, 
C135-C159, and C139-C165) and charge-charge interactions 
(H44-D101, E80-K154, and R96-E110) that are conserved in almost 
all IZUMO1 orthologues and other IZUMO family proteins (Extended 
Data Fig. 1 and Supplementary Fig. 2). 

The IZUMO1 9-254 Ig-like domain resides at the membrane-proximal 
end of the molecule. It adopts a seven-stranded (A, B, C, C’, E, F, G) 
8-sandwich with the two $-sheets covalently linked with an 
Ig-superfamily (IgSF) conserved disulfide bond (C182-C233) between 
strands B and F (Fig. 1). Seven-stranded Ig-like folds classically consist 
ofa3-+4 arrangement with 6-strands A, B and E forming 6-sheet 1, and 
6-strands C, C’, F and G forming 3-sheet 2 (ref. 13) (Supplementary 
Fig. 3). The IZUMO 12-254 Ig-like domain has a novel 2+ 5 organ- 
ization representing a previously undescribed IgSF subtype. In 
IZUMO13)_954, strand A interacts with 3-sheet 2 rather than 3-sheet 1. 
The disulfide bond preceding (-strand A (C139-C165) may constrain 
the movement of the strand towards }-sheet 1 and thereby result in this 
strand switch (Supplementary Fig. 3). We also determined the crystal 
structure of a slightly longer IZUMO1.9_26g construct at 2.9 A reso- 
lution to gain insights into the C-terminal linker region immediately 
following the Ig-like domain. The IZUMO1 )_26 structure superim- 
poses well with IZUMO 1 _254 (root mean square deviation (r.m.s.d.) 
of 1.0A over all atoms) (Supplementary Fig. 4). However, no electron 
density was observed after residue 256, suggesting that the linker region 
following the Ig-like domain is flexible. 

JUNO (previously known as folate receptor-6 (FOLR-8)) is a 
glycophosphatidylinositol (GPI)-anchored, cysteine-rich glycoprotein 
displayed on the egg surface that has been demonstrated to be the egg 
receptor of IZUMO1 (ref. 6). We determined the crystal structure of 
unbound JUNO%9-22¢ at 1.8 A resolution. JUNO29-29g has a globular 
architecture that is composed of five short a-helices (al, «2, 03, a4 
and a5), three 319 helices and two short two-stranded antiparallel 
6-sheets (Fig. 1 and Supplementary Fig. 1). Eight conserved disulfide 
bonds stabilize the core helices 1, «2, «3 and «4 and the flexible loops. 
JUNO shares sequence and structural similarity with human FOLR-a 
and FOLR-( (~58% sequence identity and ~1 A r.m.s.d. over 197 Ca 
atoms)!*1> (Extended Data Fig. 4). Despite the close similarities, six key 
folate binding residues in FOLR-a and FOLR-( are not conserved in 
human JUNO, consistent with previous observations that JUNO does 
not bind folate®!*!> (Extended Data Fig. 4). While this manuscript was 
under revision, a partial structure of mouse JUNO was determined'® 
(Supplementary Fig. 5). 

Biolayer interferometry (BLI) and surface plasmon resonance (SPR) 
were used to measure the binding affinities of human IZUMO1 and 


1Department of Laboratory Medicine and Pathobiology, Faculty of Medicine, University of Toronto, Toronto, Ontario M5S 1A8, Canada. Department of Medicine, University of California, San Diego, 


La Jolla, California 92093, USA. 


562 | NATURE | VOL 534 | 23 JUNE 2016 


© 2016 Macmillan Publishers Limited. All rights reserved 


a C135 6139 C165 


C22C25 (C1490 15: 

C159C 18: C233 
ee eee 
1 i | | |] 254 292 


L1 


IZUMO1 
Ig-like 


Figure 1 | Overall structures of human IZUMO1 and JUNO. a, Domain 
schematics of human IZUMO1 and JUNO. Red Y-shaped and green 
lollipop symbols denote N-linked glycans and a glycophosphatidylinositol 
(GPI)-anchor, respectively. Regions not observed in the crystal structure 
are shaded grey. 4HB, four-helix bundle; CT, cytoplasmic tail; Ig, 


JUNO. The interaction is stable with a dissociation constant (Kg) of 
about 48-60 nM (Extended Data Fig. 3). In addition, IZUMO1 9-254 
and JUNOn29-228 co-purify as a 1:1 complex on size-exclusion 
chromatography. Our results indicate that these two proteins form a 
stable complex during the gamete fusion process. To understand the 
precise molecular interactions of IZUMO1 and JUNO, we determined 
the structure of the ILUMO1 2_54-JUNO29_223 complex at 2.4A 
resolution. In the asymmetric unit, we observed one IZUMO1 29-54 
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immunoglobulin-like domain; SP, signal peptide; TM, transmembrane 
region. b, Ribbon representation of unbound IZUMO1))-254 and 
JUNO20-_228. Cysteine residues that form conserved disulfide linkages are 
highlighted in red. 


molecule binding to one JUNO20-22s molecule with an interface that 
spanned a surface area of about 910 A? (Fig. 2 and Extended Data 
Fig. 5). IZUMO1 has been shown to interact with JUNO via its 
N-terminal domain’”. The crystal structure indicates that residues from 
all three IZUMO1 regions (4HB, hinge and Ig-like) contact JUNO20- 
22g through extensive non-bonded van der Waals, hydrophobic 
and aromatic interactions (more than 60% of total interface interac- 
tions; Extended Data Fig. 5). There are also two intermolecular salt 


Figure 2 | IZUMO1-JUNO heterotypic 
assembly. a, Crystal structure of the human 
IZUMO 19-254-JUNO29-228 complex shown as a 
ribbon diagram. IZUMO1 2-254 and JUNO 9-228 
are coloured as in Fig. 1. A disordered loop 
between the 81 and 82 strands of JUNO 9-228 is 
shown by a black dashed line. b, Electrostatic 
potential surface representation of the 

IZUMO 192-254-JUNOx9_228 binding interface. 
The footprints of the binding interface are shown 
by black dashed lines. R160 and E71 on IZUMO1 
form a salt bridge with E45 and K163 on JUNO, 
respectively. c, Binding site interactions of 
IZUMO1 2-954 and JUNO 9-228. Side chains of 
key residues involved in hydrogen bond or salt 
bridge interactions are shown. d, BLI binding 
affinity analysis of IZUMO1 9-254 and JUNO20_28 
interface mutants. The wild-type IZUMOI1- 
JUNO interaction is normalized at 100% and the 
binding affinities (Kg) for each mutant are shown 
as percent reductions compared to wild type. 

All experiments were performed with technical 
triplicates (n = 3), with mean Kq values + s.e.m. 
shown in Extended Data Table 1. 
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Figure 3 | Conformational changes in IZUMO1 upon binding to JUNO. 
Superposition of structures of unbound IZUMO 159-354 and JUNO29_228 
(shown in grey) on the IZUMO19-254-JUNOz20_223 complex (coloured 

as in Fig. 1). Black arrows highlight the positional changes in secondary 
structure with the corresponding distances shown in angstroms. Inset, 
conformational changes within the L2 region during formation of the 
complex. The L2 region residue D72 and 4 helix residue Q130 (both 
shown in grey) form a hydrogen bond in the unbound IZUMO 199-954 
structure. Upon binding to JUNO2-22, the L2 region E71 (orange) forms 
an electrostatic interaction with JUNO20-223 K163 (purple). 


bridges (K163yuNo-E7 lizumo1 and E45yyuno-R1601zumo1) and eight 
hydrogen bond interactions at the interface (Fig. 2 and Extended 
Data Fig. 5). However, all of these interactions are more than 3.0 A 
apart, suggesting that they are weak. 

Structural comparison of IZUMO1)_254 and JUNO 29-228 alone with 
their bound states revealed no major structural changes in JUNO 9-228 
or the Ig-like domain of IZUMO1 upon complex formation (Fig. 3). 
A major conformational change was observed in the 4HB and hinge 
regions. All four helices in the 4HB region move about 20 A towards 
JUNO29-223 upon binding, whereas the L2 and hinge regions shift by 
about 8 A. As a result, bound IZUMO1 )_254 abandons its distinct 
boomerang shape and adopts an upright conformation. The structural 
constraints imposed by the short loops between the a1-a2 and a3-a4 
helices allow the 4HB domain to translate as a single unit (Fig. 3). 

To understand the conformational dynamics of the IZUMO1- 
JUNO interaction in solution, we used a hybrid approach that com- 
bined small angle X-ray scattering (SAXS) and deuterium exchange 
mass spectrometry (DXMS). Ab initio SAXS reconstructions of 
unbound IZUMO1 2-254 revealed a distinct boomerang shape, 
similar to its crystal structure (Extended Data Fig. 6). When it binds 
JUNO29-228, IZUMO1 99-254 adopts an upright conformation. Our 
DXMS studies revealed that the residues lining the binding interface 
exhibited a reduced exchange profile in the complex compared to 
the unbound state (Fig. 4, Extended Data Fig. 6 and Supplementary 


Fig. 6). Moreover, DXMS experiments performed on IZUMO1 alone 
indicated a high level of exchange in the hinge region, suggesting 
dynamic flexible motion within this region. Upon binding JUNO, 
deuterium exchange of residues 127-140 of IZUMO1 in the hinge 
region was reduced by over 50%, which is greater than the reduction 
observed in residues at the IZUMO1-JUNO interface (Fig. 4). The 
strong level of hydrogen/deuterium protection is due to the forma- 
tion of 10 additional main-chain hydrogen bonds within residues 
127-140 of the hinge region. This suggests that the IZUMO1 hinge 
region is stabilized in a ‘locked’ upright position in the presence of 
JUNO. 

Mutational studies at the IZUMO1-JUNO interface revealed the 
structural determinants required for binding (Fig. 2, Extended Data 
Table 1 and Supplementary Fig. 7). Upon binding to JUNO20_228; 
a D72-Q130 hydrogen bond between IZUMO1 L2 and the 4HB a4 
helix is disrupted to form a new intermolecular salt bridge (E71yzumoi- 
K163;uNo) at the interface (Fig. 3). Mutations of IZUMO1 D72 and 
Q130 to alanine did not affect binding to JUNO, as these residues are 
not involved at the interface. Alanine and charge-reversal mutations 
of the intermolecular E71yzumo1-K163yuno salt bridge reduced the Ky 
by approximately twofold, suggesting that this ion pair has a minor 
role in binding. Mutations to the R160;zumoi1-E45yuno intermolecular 
salt bridge result in a roughly 50-fold reduction in binding affinities, 
suggesting that this second electrostatic interaction has a major role 
in IZUMO1-JUNO recognition. Using SAXS, we characterized the 
binding mode of IZUMO]1 and JUNO salt bridge mutants that hindered 
high-affinity complex formation. The SAXS scattering data and recon- 
structions of mutant complexes show no major changes compared to 
the wild-type IZUMO1-JUNO complex, and suggest proper complex 
formation despite up to 50-fold decrease in affinity (Extended Data 
Fig. 7). Notable reductions in binding (more than 20-fold) were also 
observed to result from mutations of IZUMO1 and JUNO residues that 
are conserved in most mammals (W148}zumo1, H1571zumo1, W62yuno 
and L81,yno). In fact, a mutation of W148;zumo1 to alanine completely 
abolished IZUMO1-JUNO binding. These results agree well with the 
cell-oocyte binding assay presented in an accompanying crystal struc- 
ture of the same complex'*. Together, the results of our mutagenesis 
study suggest that the interface is probably stabilized through the com- 
bined effects of multiple van der Waals, hydrophobic, aromatic and 
electrostatic interactions. This allows the IZUMO1-JUNO interface 
to be resilient to mutations. 

Although we observed discernible sequence conservation at the 
complex interface in both proteins, comparative sequence analysis 
revealed considerable variations among a number of interface residues. 
Approximately half of the residues (JUNO: Y44, E45, L58, F77, M83, 
R87, M145, Y147, K163; IZUMOI1: L69, V141, K150, N151, K153, E155, 
A158, Y163, N239, S241) vary across mammalian species (Extended 
Data Figs 1, 2). Similar to the species-specific recognition employed 
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Figure 4 | Comparative DXMS profiles of human IZUMO1 and JUNO binding. a, b, Difference in hydrogen-deuterium exchange upon complex 
formation mapped onto the molecular surfaces of JUNO 9-228 (a) and IZUMO1 9-954 (b). 


564 | NATURE | VOL 534 | 23 JUNE 2016 


© 2016 Macmillan Publishers Limited. All rights reserved 


by glycoproteins in the zona pellucida that surrounds the egg, these 
residues may act to restrict productive IZUMO1 and JUNO binding 
to a specific pair of species®!”°. For example, in primates, these resi- 
dues are mainly preserved (Extended Data Figs 1, 2), suggesting that 
the species-specific diversification of IZUMO1 and JUNO may be 
restricted to non-primate mammals. 

Our structural characterizations indicate that human IZUMO1 
does not have properties predictive of viral, intracellular or develop- 
mental fusogens, such as influenza A virus haemagglutinin (HA) and 
Caenorhabditis elegans epithelial fusion failure-1 (EFF-1) proteins”! 
(Extended Data Fig. 8). This suggests that IZUMO1 does not function 
as a direct fusion protein. At least three different fusion mechanisms are 
possible (Extended Data Fig. 9). First, IZUMO1 may act as a scaffold 
to recruit a protein complex that contains or regulates other fusion 
proteins. The requirement of a multiprotein complex for fusion is not 
unusual, as some viruses, such as herpes simplex virus-1 or Epstein Barr 
virus, require the formation of a multicomponent fusion complex”. 
Alternatively, Inoue et al. proposed that monomeric IZUMO1 on the 
sperm surface interacts with JUNO”® and, subsequently, a protein 
disulfide isomerase facilitates the dimerization of IZUMO1 to allow 
it to interact with another oocyte receptor to facilitate fusion. Finally, 
the tight heterotypic interaction between human IZUMO1 and JUNO 
proteins may be sufficient to bring the sperm and egg membranes 
into close apposition and thereby lead to fusion. Regardless, the con- 
formational changes within IZUMO1 suggest that receptor adhesion 
triggers the progression of the 4HB domain to the vicinity of the egg 
membrane and the conformational switch may be part of the struc- 
tural changes required for fusion. After fusion, the fertilized egg rapidly 
sheds JUNO molecules into the perivitelline space®. Given our meas- 
ured tight nanomolar affinities between IZUMO1 and JUNO, shed 
JUNO may essentially act as a rapid ‘sperm-sink to neutralize incoming 
acrosome-reacted sperm as an additional block to polyspermy. This 
process may be analogous to the shedding of viral glycoproteins 
(for example, Ebola virus shed and soluble glycoprotein), which can act 
as a decoy to absorb antibody responses*”””*. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized, and investigators were not blinded to allocation during 
experiments and outcome assessment. 

Protein expression and purification. The genes encoding full-length human 
IZUMO1 (GenBank accession number: NM_182575, residues 1-350) and human 
JUNO (GenBank accession number: NM_001199206, residues 1-250) were codon 
optimized for expression in D. melanogaster and gene synthesized (Integrated 
DNA Technologies). The DNA sequences encoding the extracellular regions of 
IZUMO1 (residues 22-254 and 22-268) and JUNO (residues 20-228) with a BiP 
signal peptide were subcloned into a metallothionein promoter pMT expression 
vector (Invitrogen) modified with a puromycin selection marker (pMT-puro). All 
protein constructs contain a thrombin cleavage site and a 10 x His affinity tag at 
the C terminus. Binding interface mutants of IZUMO1»)-54 and JUNO29-223 were 
generated using a modified QuikChange PCR-based site-directed mutagenesis 
protocol. The resulting wild-type and mutant IZUMO 199-254, IZUMO12-26s and 
JUNO29-223 pMT-puro expression plasmids were stably transfected in Drosophila 
S2 cells (Invitrogen) using Effectene transfection reagent (Qiagen) according to 
the manufacturer's protocol. (S2 cells have been tested by Invitrogen for contam- 
ination of bacteria, yeast, mycoplasma and viruses, and been characterized by 
isozyme and karyotype analysis.) Briefly, Drosophila S2 cells were cultured in 
Schneider’s medium (Lonza) supplemented with 10% (v/v) heat-inactivated fetal 
bovine serum (FBS) plus 1x antibiotic-antimycotic (Gibco), and propagated at 
27°C. The day before transfection, 3 x 10° cells were seeded per well in a 6-well 
plate (Corning) with 3.0 ml complete growth medium and incubated overnight. On 
the day of transfection, 2 |1g expression plasmid was mixed with the transfection 
reagents and the transfection complexes were added drop-wise onto the S2 cells. 
At 72h post-transfection, the cultured medium was replaced with fresh $2 growth 
medium supplemented with 61g ml! puromycin (Bioshop). Subsequently, S2 cells 
were gradually adapted to FBS-free Insect-XPRESS growth media (Lonza) with 
6g ml“! puromycin. Stably transfected cells were grown to 1 x 10’ cells per ml in 
Insect-XPRESS growth medium using vented 2-1 polycarbonate Erlenmeyer flasks 
(VWR) at 27°C. Protein expression was induced with 500 .M final concentration 
of sterile-filtered CuSO,4. Cultured medium was collected 6-days after induction, 
clarified by centrifugation at 6750g for 20 min, concentrated and buffer exchanged 
into Ni-NTA binding buffer (20 mM Tris-HCl (pH 8.0), 300mM NaCl, 20mM 
imidazole) using a Centramate tangential flow filtration system (Pall Corp.) All 
IZUMO1 and JUNO proteins were purified by Ni-NTA metal affinity chromatog- 
raphy. Eluted samples were buffer exchanged into TBS (10 mM Tris-HCl (pH 8.0), 
150mM NaCl) using a PD-10 desalting column (GE Life Sciences) and thrombin 
(EMD Millipore) digested at 22°C for 24h (1:2,000 (w/w) enzyme:protein ratio). 
The cleaved protein samples were then buffer exchanged to a low pH buffer (10 mM 
sodium acetate (pH 5.6), 150 mM NaCl) and enzymatically deglycosylated using 
100 U endoglycosidase H (New England Biolabs) per mg of IZUMO1 or JUNO at 
22°C for 16h. To prepare the IZUMO1-JUNO protein complexes, deglycosylated 
IZUMO1 9-254 and JUNOz9-228 samples were mixed at a molar ratio of 1:1 and 
incubated at 22°C for 2h before size-exclusion chromatography on a custom prep- 
grade Superdex-200 XK 16/70 column equilibrated with TBS. Peak fractions were 
pooled and protein concentrations were quantified by measuring A2g0. 

Circular dichroism spectroscopy. Circular dichroism spectra of human 
IZUMO 15-263 were acquired on a Jasco J-810 spectropolarimeter using a 1-mm 
quartz cuvette (Helma). Circular dichroism measurements were conducted with 
50-100,1M protein samples purified in 10 mM potassium phosphate (pH 7.5) and 
150mM NaCl buffer. Wavelength scans were recorded at 25°C between 190 nm 
and 250 nm and averaged over five accumulations. Data were converted to mean 
residue ellipticity and secondary structure content was estimated using the K2D 
algorithm in the DichroWeb analysis server’. Thermal denaturation assays were 
performed at a wavelength of 222 nm by increasing the temperature from 20°C to 
99°C and monitoring the change in ellipticity as a function of temperature. The 
data were baseline corrected with buffer blank, normalized between 0 (folded) 
and 1 (unfolded) and fit to a nonlinear biphasic sigmoidal curve using GraphPad 
Prism (GraphPad Software). 

Dynamic light scattering (DLS). IZUMO1))_954, JUNO 9-228, and IZUMO19_254- 
JUNOx9-223 complex samples were prepared in TBS with 2% (v/v) glycerol and 
concentrated to 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 4.0 and 5.0 mg ml! before the DLS meas- 
urements. DLS experiments were performed at 25°C on a DynaPro Plate Reader II 
(Wyatt Technology). For each condition, 22 11 of sample was loaded in triplicate 
onto a black 384-well clear-bottom plate (Greiner). Data acquisition was recorded 
over 5s with a total of ten acquisitions for each concentration. The polydispersity 
and hydrodynamic radius (Ry) of the molecules in solution was calculated using 
Dynamics (v.7) software (Wyatt Technology). 

Size exclusion chromatography-multiangle light scattering (SEC-MALS). 
The oligomeric state of tag-removed, glycosylated IZUMO12_26g was assessed 


by multiangle light scattering. Monomeric bovine serum albumin (BSA) 
standard (2 mg ml}; 66,432 Da) dissolved in PBS buffer (10 mM phosphate 
(pH 7.4), 2.7mM KCI, 137mM NaCl) was used to calibrate the MALS detectors. 
IZUMO139-26g was purified on an analytical Superdex-75 10/300 GL size exclusion 
column equilibrated in PBS buffer to ensure proper monodispersity. Then, 600 1g 
IZUMO1 9-268 was applied onto a PBS-equilibrated Superdex-200 Increase 10/300 
GL size exclusion column in-line with a Viscotek MALS detector (Malvern). The 
data were processed and weight-averaged molecular mass was calculated using the 
OMNISEC (v. 5.1) software package (Malvern). 

Biolayer interferometry (BLI). The binding affinities of IZUMO12-254 to 
JUNOx9_228 were measured by biolayer interferometry using a single-channel BLItz 
instrument (Pall FortéBio), based on protocols previously described*”. Briefly, 
purified wild-type or mutant JUNO 0-23 in PBS buffer was biotinylated using the 
EZ link sulfo-NHS-LC-biotinylation kit (Thermo Pierce), according to the man- 
ufacturer’s instructions. Excess biotin reagent was removed by overnight dialysis 
in PBS. All streptavidin-coated (SA) biosensors were hydrated in BLI rehydration 
buffer (PBS, 0.5mg ml~! BSA and 0.01% (v/v) Tween-20) for 10 min. Biotinylated 
JUNO29-228 (bait) was diluted in BLI kinetics buffer (PBS, 0.1 mgml~! BSA 
and 0.01% (v/v) Tween-20) to a final concentration of 20j.gml~! and immobilized 
onto a SA biosensor for 90s. Multiple concentrations of wild-type or mutant 
IZUMO 12-254 (analyte) were prepared in BLI kinetics buffer and association to 
IZUMO1 9-54 was measured over 90s at 20°C. Subsequently, the SA biosensor 
was immersed into BLI kinetics buffer for 90s to dissociate the analyte. All exper- 
iments were performed in triplicate. Two negative controls were performed: BSA 
and BLI kinetics buffer only against SA biosensors loaded with biotinylated bait 
to detect non-specific binding. The data were analysed and sensorgrams were 
step corrected, reference corrected and fit globally to a 1:1 binding model. The 
equilibrium dissociation constant (Kg), association (k,) and dissociation (kg) rate 
constants and their associated standard errors were calculated using BLItz Pro data 
analysis (v. 1.1.0.16) software. 

Surface plasmon resonance (SPR). The affinities and kinetics of wild-type 
IZUMO1)-254 binding to wild-type JUNO29-223 were assessed by SPR on a 
Biosensing Instruments BI-4000 system at 20°C using a CM-dextran sensor chip. 
Prior to immobilization, pH scouting between pH 4.5 and 6.5 was performed to 
identify the optimal pH for immobilization. Wild-type JUNO29-223 was immobi- 
lized using a coupling buffer containing 10 mM sodium acetate pH 5.0 onto one 
of two flow channels using the manufacturer’s standard amine-coupling protocol. 
Association of the wild-type IZUMO1»2-254 analyte (0.75 1M, 0.5 1M, 0.375 1M, 
0.25 1M, 0.188 uM, 0.125 1M, 0.0937 11M and 01M) was measured at a flow rate of 
50,lmin~! for 90s. The second flow cell, containing no bait, was injected with PBS 
buffer in a serial flow and used as a reference. Subsequently, PBS buffer was injected 
at a flow rate of 50,11 min“! over 180s to dissociate wild-type IZUMO1 9254. The 
cells were regenerated between two analyte runs using the rapid injection protocol 
involving 8 cycles of 20-11 injections of 0.01 M NaOH-acetate pH 9.0 followed 
by an equal volume of 1 x PBS. Measurements were performed in triplicate. The 
resulting SPR sensorgrams were corrected with the reference and blank (0-|1.M ana- 
lyte) curves, and fitted globally with a 1:1 Langmuir binding model using BI-Data 
Analysis and BI-Kinetic Analysis SPR software. 

Crystallization and X-ray data collection. Purified IZUMO 112-254, IZUMO 129-268, 
JUNO 0-228, and the IZUMO19-254-JUNO29-223 complex were concentrated 
to 10mgml". All crystallization trials were performed at 22°C by sitting drop 
vapour diffusion (300 nl protein and 300 nl mother liquor) in 96-well low profile 
Intelliplates (Art Robbins) using an Oryx8 protein crystallization robot (Douglas 
Instruments). 

IZUMO1)2-754 and IZUMO12-26s. Initial sparse matrix screening of IZUMO123-254 
and IZUMO1»2_6 constructs identified needle-shaped crystals in multiple con- 
ditions. IZUMO 12-254 and IZUMO 19-26 crystals were manually optimized 
in 48-well MRC Maxi crystallization plates using 2-1] sitting drops. Larger 
needle-shaped IZUMO 1-254 crystals appeared the next day and reached a max- 
imum length of ~250 1m within 3-4 days in 0.07 M sodium acetate (pH 4.6), 
5.6% (w/v) PEG 4000 and 30% (v/v) glycerol. Larger IZUMO1»3_26g crystals 
were more difficult to obtain and required further optimization using random 
microseed matrix screening (rMMS) with Oryx8 (ref. 31). rMMS led to thicker 
needle crystals in 0.085 M HEPES sodium salt (pH 7.5), 8.5% (v/v) isopropanol, 
17% (w/v) PEG 4000 and 15% (v/v) glycerol. These crystals reached a final length of 
~200 1m within 4-5 days. All crystals were cryoprotected and flash-cooled in 
liquid nitrogen. IZUMO1 2-254 and IZUMO19_26s crystals diffracted to Bragg 
spacings of 3.1 A and 2.9A, respectively, and data sets were remotely collected at 
the Canadian Light Source (CLS) 08ID-1 beamline (Supplementary Fig. 8). 
JUNO 2-228. Rod-shaped JUNO20_223 crystals were grown in 0.02 M mag- 
nesium chloride, 0.1 M HEPES sodium salt (pH 7.5), 22% (w/v) polyacrylic 
acid 5100. Crystals typically appeared after 3-4 days and reached full size in 
1 week. The mother liquor supplemented with increasing amounts of sucrose 
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(up to 30% (w/v) in final solution) was used as a cryoprotectant before being rap- 
idly cooled in liquid nitrogen. JUNO crystals readily produced Bragg reflections 
at better than 2.0 A resolution on a Rigaku FR-E Superbright X-ray generator and 
Saturn A200 HD CCD detector (Rigaku Corp.), anda 1.8 A resolution data set 
was collected at the Structural Genomics Consortium X-ray diffraction facility 
(Supplementary Fig. 8). 

IZUMO1 2-254-JUNO20-228 complex. Crystals of the protein complex were 
grown in sitting drops containing equal volumes (1 1) of purified protein and 
crystallant (0.1 M MES (pH 6.5), 20% (w/v) PEG 4000 and 0.6 M NaC\l). Crystals 
were observed after 3 days and matured to full size within a week. Crystals were 
cryoprotected by sequential soaking in mother liquor with 5%, 10%, 20% and 
30% (w/v) sucrose. Crystals were directly immersed into liquid nitrogen and 
screened at the CLS beamline 08ID-1. A data set was collected from a single 
IZUMO19-254-JUNO20_-223 complex crystal diffracting to 2.4A resolution 
(Supplementary Fig. 9). 

Structure determination and refinement. All diffraction data were integrated 
and reduced with the XDS program package™ and scaled using Aimless** from 
the CCP4 program suite. Crystallographic data collection and final refinement 
statistics are presented in Supplementary Table 1. 

JUNO 9-223. The structure of JUNO29_22s was determined by molecular replace- 
ment with Phaser*! using human folate receptor-c (PDB ID: 4LRH) as the search 
model. Initial characterization of the JUNO29-22s X-ray data using phenix.xtriage* ° 
and DETWIN™* indicated translational pseudosymmetry” (TPS) and near-perfect 
merohedral twinning** with an estimated twin fraction of 0.45 (Supplementary 
Fig. 8). The twinning fraction was calculated from the cumulative distribution of 
H* and Britton plots*°, with the twin fractions related by the twin law k, h, —I. 
It was necessary to apply the twin law throughout the refinement to further refine 
the JUNO9_22s structure using phenix.refine*!. 

IZUMO152_254-JUNO20-223 complex. The initial phases for the IZUMO112-254- 
JUNO 9-223 complex were calculated via molecular replacement with Phaser*4, 
using the human folate receptor-c structure (PDB ID: 4LRH) as an initial search 
model. One clear solution (Z= 14.6) was identified. Strong electron density was 
observed for JUNO29-22g and the 4HB domain of IZUMO1»)-254. The poly-alanine 
chain of the IZUMO1 2-554 was initially traced by a combination of phenix. 
autobuild” and manual building with Coot. 

Validation of proper sequence registry was confirmed by locating the sulfur 

anomalous signals from methionine and cysteine residues. Multi-crystal sulfur 
anomalous data were collected on native IZUMO1»)_254-JUNO20_223 complex 
crystals. The X-ray beam was focused to 501m and the sulfur anomalous signal 
was measured at a wavelength of 1.7712 A using a MarMosaic MX300 CCD detector 
(Rayonix). All crystals were rod-shaped and >400 1m in length, thus allowing 
us to translate along the rotation axis to expose a fresh undamaged part of the 
crystal. 360° of data with a rotation angle of 1.0° per frame were collected for 
each set before translating to a new part of the crystal. Each data set was pro- 
cessed individually with anomalous signal using XDS$*’. Twenty-four data sets with 
Rmmeas < 10% were merged together using XSCALE” and converted to CCP4 data 
format using XDSCONV™, F2MTZ and CAD*. The overall Rmerge and anoma- 
lous multiplicity for the merged data set to 2.8 A resolution were 9.9% and 89.6, 
respectively. An anomalous difference Fourier electron density map was calculated 
using PHENIX® and confirms the correct location for all 38 protein sulfur sites 
(Supplementary Fig. 9). 
IZUMO 132-254 and IZUMO 122-268. The structures of IZUMO1 22-254 and 
IZUMO152_26g were determined by molecular replacement. An initial molecular 
replacement search using the refined IZLUMO1 2-254 structure from the 
IZUMO 19-254-JUNO29_22 complex failed, probably because of conformational 
changes between the 4HB and Ig-like domains. A molecular replacement search 
was performed first using the ILUMO1 Ig-like domain (residues 167-254) 
followed by a second molecular replacement search using the 4HB and hinge regions 
(residues 22-166). Clear solutions were identified for both sections in IZUMO1 9-254 
and IZUMO 192-268- 

All structures were manually rebuilt using Coot’? and refined using phenix. 
refine’. No non-crystallographic (NCS) symmetry restraints were employed 
except in the case of JUNO29-22g, where a four-fold NCS was applied. All 3-strands 
and a-helices were real-space refined with torsional secondary structural restraints 
using Coot. Torsion-angle simulated annealing refinement, starting at 5,000 K, with 
individual atomic displacement and Translation/Liberation/Screw (TLS) groups 
was carried out using Phenix. Owing to the lower resolutions of the IZUMO12-254 
and IZUMO15)-26g data, these structures were refined with grouped B-factor 
refinement. Calculation of annealed 2|F,| — |F.| composite omit maps“! helped 
minimize model bias during rebuilding. 

Validation and structure analysis. The stereochemical quality of all the refined 
models was validated using MolProbity**, PROCHECK* and Coot". No resi- 
dues were identified in disallowed regions of the Ramachandran plot. Moreover, 
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the R values, B-factors, and r.m.s.d. bond lengths and angles of all structural models 
are consistent with other deposited structures determined at similar resolutions, as 
validated by polygon.phenix®. All structural representations were prepared using 
PyMOL (v. 1.7.4 Schrodinger, LLC.) 

SAXS data collection and reconstruction. SAXS provides medium-resolution 
visualization of global structural conformational changes in solution. SAXS 
experiments were performed by mail-in SAXS on beamline 12.3.1 (SIBYLS) 
at the Advanced Light Source. Various protein concentrations of tag- 
cleaved, deglycosylated wild-type and mutant IZUMO1»)_254, JUNO20-228 
and IZUMO1»3-254-JUNO29_22s complexes, along with matching buffer obtained 
from SEC (10 mM Tris-HCl (pH 8.0), 150 mM NaCl and 2% (v/v) glycerol), 
were loaded into a 96-well PCR plate (Axygen) and stored at 4°C before data 
collection. Samples were loaded into the SAXS sample cell using a Hamilton 
syringe robot. For the wild-type IZUMO1-254, JUNO20-228 and IZUMO19-54- 
JUNO 20-228 complex, data were collected at a wavelength (A) of 1.0 A using a 
MarCCD 165 detector (Rayonix) positioned at a distance of 1.5 m, resulting 
in scattering vector q of 0.01 A~!<q<0.32 A! (where q=4nsin(0/2)/A and 
0 is the scattering angle). Each data set was recorded at 283 K in a succession 
of three X-ray exposures of 0.5, 1, 2 and 5s. For the mutant IZUMO1»)_254- 
JUNO. 9-223 complexes, data were collected at 1.0A wavelength using a 
Pilatus3 x 2 M detector (Dectris) (0.01 A>! <q <0.55A~'). Data for the mutant 
IZUMO 1 9-254-JUNO20_228 complexes were recorded in a time slicing mode 
of 0.5-s exposures over 15s (30 frames per sample). Data for buffer blanks 
were collected before each protein image and subsequently buffer subtracted. 
Sample radiation damage was assessed by overlaying short and long exposures 
and detecting for any shifts in the scattering curves using the program 
SCATTER”. Concentration and aggregation effects were detected by comparing 
the lowest scattering angles for each of the protein samples. Fits to the Guinier 
region were made using autoRg. To maximize the signal-to-noise ratio, the 
SAXS scattering curve at the highest concentration that is free of interparticle 
interference was used for subsequent analysis. The characteristic real-space 
distance distribution function, P(r), was determined from the scattering data 
using an indirect Fourier transformation and the maximum dimension, Dax". 
All ab initio reconstructions of molecular envelopes from SAXS data were 
performed using the program DAMMIN”’. Twenty-three DAMMIN models 
were superimposed and averaged by the program DAMAVER™ to obtain a 
consensus averaged structure. Alignment of the SAXS reconstructions with 
the final refined crystal structures was performed using Chimera*!. 
Deuterium exchange mass spectrometry (DXMS). DXMS, which measures 
kinetics of backbone amide solvent exchange, provides local residue-level confor- 
mational dynamics. Prior to performing deuterium exchange experiments, the 
optimal proteolysis conditions were established as previously described>*°? to 
maximize peptide sequence coverage of tag-cleaved, deglycosylated IZUMO1 29-254 
and JUNO29_22s. Briefly, 1,11 of diluted protein stock solution (2 mg ml! in 
10 mM Tris (pH 7.2), 150 mM NaCl) was mixed with 51] quench buffer (6.4M 
GuHCl and 1.0 M TCEP in 0.8% (v/v) formic acid, 16.6% (v/v) glycerol). After 
incubating on ice for various times (2, 5, 10, 15 and 30 min), the quenched 
samples were mixed with 241] dilution buffer (0.8% (v/v) formic acid, 16.6% 
(v/v) glycerol) and then subjected to proteolysis and LC-MS analysis. The 
IZUMO12-254-JUNO29-22s complex was formed by mixing IZUMO1)-254 and 
JUNO }o-22¢ at 1:1.2 or 1.2:1 molar ratios and incubating the samples at 22°C for 
2h. Deuterium exchange was initiated by mixing 3.5 1] of protein stock solu- 
tion (IZUMO159-254, IZUMO199-254-JUNO29-228, JUNO20-228 or JUNO29-228- 
IZUMO1 9-954) with 7 pl D2O buffer (8.3 mM Tris (pH 7.2), 150mM NaCl in 
D20, pDyead 7-2) and incubating at 0°C for 10, 100, 1000, 10,000 and 100,000 s. At 
indicated times, 2.1 1] of exchange samples were added to 10.5 11 quench solution 
to stop the D2O exchange reaction. After 5 min (IZUMO1»9-254 or IZUMO12)- 
254-J UNOz9-228) or 10 min (JUNOz0-228 or JUNO29-22s-IZUMO 123-254) 
incubation on ice, quenched samples were diluted by addition of 48 1] of ice- 
cold dilution buffer, and then immediately frozen on dry ice and stored at 
—80°C. The non-deuterated control samples and equilibrium-deuterated con- 
trol samples were also prepared by mixing protein with H2O buffer (8.3 mM 
Tris (pH 7.2), 150mM NaCl in H,O) and equilibrium-deuterated buffer (0.8% 
(v/v) formic acid in 99.9% D20)*4. The frozen samples were then thawed at 
5°C and passed over an immobilized pepsin column (16-11 bed volume) at a 
flow rate of 20,11 min~!. The resulting peptides were collected on a Cig trap for 
desalting and separated by a Magic AQ C18 reverse phase column (Michrom 
BioResources) using a linear gradient of acetonitrile from 6.4% to 38.4% over 
30 min. MS analysis was performed using the OrbiTrap Elite Mass Spectrometer 
(ThermoFisher Scientific), with a capillary temperature of 200°C. Data were 
acquired in both data-dependent MS/MS mode and MS1 profile mode, and 
the data were analysed by Proteome Discoverer software and DXMS Explorer*° 
(Sierra Analytics Inc.). 
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Extended Data Figure 1 | Conservation of IZUMO1 residues. 

a, Alignment of IZUMO1 protein sequences from various mammals. 
IZUMOI1 sequences from Homo sapiens (human; GenBank: BAD91012.1), 
Macaca mulatta (rhesus macaque; GenBank: EHH30233.1), Gorilla 
gorilla (gorilla; Uniprot: G3QFY5), Pan paniscus (bonobo; NCBI: 
XP_003814124.1), Callithrix jacchus (marmoset; Uniprot: F7H859), 
Chlorocebus sabaeus (green monkey; Uniprot: AOA0D9S2Z4), Papio 
anubis (baboon; Uniprot: AOAOAO0MU86), Nomascus leucogenys 
(gibbon; Uniprot: G1QXF7), Mus musculus (mouse; GenBank: 
BAD91011.1), Rattus norvegicus (rat; GenBank: BAD91013.1), Ictidomys 
tridecemlineatus (squirrel; Uniprot: I3N2L9), Cavia porcellus (guinea pig; 
Uniprot: HOUTJ7), Ochotona princeps (pika; NCBI: XP_004597241.1), 
Oryctolagus cuniculus (rabbit; Uniprot: GIT VX5), Felis catus (cat; 

NCBI: XP_006941089.1), Canis familiaris (dog, Uniprot: FEUM65), 
Ailuropoda melanoleuca (giant panda, Uniprot: G1M882), Equus caballus 
(horse; Uniprot: F6YE25), Bos taurus (cow; Uniprot: ELBDA8), Sus 
scrofa (pig; Uniprot: F1RIQ7), Capra hircus (goat; Uniprot: C6ZEA2), 


Ovis aries (sheep; Uniprot: W5PRDO), Sorex araneus (shrew; NCBI: 
XP_004619786.1), Pteropus vampyrus (megabat; NCBI: XP_011372928.1), 
Loxodonta africana (African elephant; NCBI: XP_003406572.1), and 
Dasypus novemcinctus (armadillo; NCBI: XP_004451154.1) are aligned. 
Red boxes indicate complete conservation of a given amino acid. N-linked 
glycosylation sequons (N-X-S/T) are indicated by red-coloured Y-shaped 
symbols. Secondary structural elements observed in the crystal structure 
of IZUMO1 are shown as arrows for 3-strands and coils for a-helices. 
Residues that interact with JUNO are identified with asterisks, with those 
that form salt bridges and hydrogen bonds highlighted in blue and green 
boxes, respectively. Cysteine pairs involved in disulfide bond formation 
are numbered in red underneath each sequence. b, Footprint of JUNO 

on the molecular surface of IZUMO1. ¢, d, Representation of surface 
residue conservation, calculated using ConSurf and the alignment of all 
mammalian IZUMO1 (c) or primate-only IZUMO]1 (d) sequences from 
Extended Data Fig. 1a. Degree of residue conservation is coloured in a 
gradient from high (burgundy) to low (cyan) variability. 
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Extended Data Figure 2 | See next page for caption. 
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Extended Data Figure 2 | Conservation of JUNO residues. a, Alignment 
of JUNO protein sequences from various mammals. JUNO/FOLR-5 
sequences from H. sapiens (human; NCBI: NP_001186135.1), M. mulatta 
(rhesus macaque; NCBI: NP_001180734.1), G. gorilla (gorilla; NCBI: 
XP_004052029.1), P. paniscus (bonobo; NCBI: XP_003813838.1), 

C. jacchus (marmoset; NCBI: XP_009005477.1), C. sabaeus (green monkey; 
Uniprot: AOAOD9S1B0), P anubis (baboon; NCBI: XP_009185381.1), 

N. leucogenys (gibbon; Uniprot: G1R639), M. musculus (mouse; NCBI: 
NP_075026.1), R. norvegicus (rat; NCBI: XP_001072998.2), 

I. tridecemlineatus (squirrel; NCBI: XP_005337246.1), C. porcellus 
(guinea pig; NCBI: XP_003468609.1), Cricetulus griseus (Chinese hamster; 
NCBI: XP_003506544.1) O. princeps (pika; NCBI: XP_012782378.1), 

O. cuniculus (rabbit; Uniprot: G1T5D7), F. catus (cat; NCBI: 
XP_011284828.1), C. familiaris (dog, Uniprot: E2RTK1), E. caballus 
(horse; NCBI: XP_001491306.1), S. scrofa (pig; Uniprot: FISTK4), 

C. hircus (goat; NCBI: XP_013824827.1), L. africana (African elephant; 
NCBI: XP_010593777.1), and D. novemcinctus (armadillo; NCBI: 


XP_004471965.1) are aligned. Red boxes indicate complete conservation 
of a given amino acid. N-linked glycosylation sequons (N-X-S/T) 

are indicated by red-coloured Y-shaped symbols. JUNO is anchored to 
the plasma membrane through a GPI anchor at Ser228 (shown as a 

green lollipop). Secondary structural elements observed in the crystal 
structure of JUNO are shown as arrows for 3-strands and coils for 
a-helices. Residues that interact with IZUMO1 are identified with 
asterisks underneath the sequence, with those that form salt bridges 

and hydrogen bonds highlighted in blue and green boxes, respectively. 
Cysteine pairs involved in disulfide bond formation are numbered in red 
underneath each sequence. b, Footprint of IZUMO1 on the molecular 
surface of JUNO. c, d, Representation of surface residue conservation, 
calculated using ConSurf and the alignment of all mammalian JUNO (c) 
or primate-only JUNO sequences (d) from Extended Data Fig. 2a. Degree 
of residue conservation is coloured in a gradient from high (burgundy) to 
low (cyan) variability. 
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Extended Data Figure 3 | See next page for caption. 
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Extended Data Figure 3 | Purification and characterization of IZUMO1 
and JUNO. a, Superdex-75 10/300 GL size-exclusion chromatograms 

of JUNO 0-228, IZUMO 1o2-2545 and the IZUMO12-254-JUNO 20-228 
complex. Eluted peak positions of protein standards are marked with 
triangles and dashed lines. b, Coomassie-stained SDS-PAGE analysis of 
the purified IZUMO1 2-254, JUNO29_228 and IZUMO192_254-JUNO29-_228 
complex. For gel source data, see Supplementary Fig. Ic. c, Size-exclusion 
chromatography with inline multi-angle light scattering (SEC-MALS) 
profile of glycosylated human IZUMO1 9-26g. The detector response unit 
(mV) and molecular mass (kDa) are plotted against the elution volume 
from a Superdex-200 Increase 10/300 GL size exclusion column. SEC- 
MALS reveals an apparent molecular mass of 34.8 kDa (dashed blue line), 
which corresponds to a monomeric species. d, Surface plasmon resonance 
(SPR) binding affinity and kinetic analysis of the human IZUMO1 254 
and JUNO 29-22 interaction. Human JUNO29-22s was amine-coupled to 
the SPR sensor chip. Kinetic parameters were derived from a Langmuir 
1:1 binding model. e, Biolayer interferometry (BLI) kinetic analysis 

of the interaction between human IZUMO1 9-954 and JUNO 29_228. 


jishaSG 


Human JUNO 9-223 was biotinylated and coupled to streptavidin-coated 
biosensors. Kinetic parameters were derived from a 1:1 binding model. 
The experimental curves are shown in colour superimposed with the 
fitted curves indicated as grey lines. f, A size distribution histogram from 
dynamic light scattering (DLS) measurements of IZUMO1 9-954; 

JUNO 0-228 and IZUMO19)_954-JUNO2x9-_228 complex at 5 mg ml. 
IZUMO1 2-254, JUNO 0-228 and IZUMO192_254-JUNO29-_228 display 
hydrodynamic radii (Ry) of ~3.0nm, ~2.9nm and ~3.9 nm, respectively. 
g, Circular dichroism (CD) wavelength scan of human IZUMO19-268 
(blue) at 25°C shows mixed secondary structural characteristics. The 
crystal structure of IZUMO1 _26s aligns well with the secondary 
structural content calculated from the CD spectrum (35% «-helical, 24% 8 
-strand and 41% random coil). A reconstructed CD wavelength scan (red) 
illustrates the agreement of the fit used in secondary structural content 
analysis. A CD thermal denaturation profile of human IZUMO19-268 

at 222 nm is shown. The CD signal was normalized between 0 (folded) 
and 1 (unfolded), and plotted as a function of temperature. The T,,, value 
indicates the midpoint of the melting transition. 
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Extended Data Figure 4 | See next page for caption. 
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Extended Data Figure 4 | Structural comparison of JUNO and the 
folate receptor family of proteins. a, Structural superimposition of 

JUNO 9-228 with FOLR-a (PDB ID: 4LRH) and FOLR-6 (PDB ID: 4KMZ). 
Experimentally bound folate (FOL), shown in white sticks, from the 
FOLR-«a structure is positioned in the active site. b, Superimposition of 
residues in the folate-binding site of human FOLR-« and FOLR-8, and 
equivalent residues in human JUNO. Residue names shown in black 

are conserved among JUNO, FOLR-a and FOLR-8, and are numbered 

on the basis of the FOLR-« sequence. Inset boxes highlight the residue 
differences between JUNO, FOLR-« and FOLR-8. Key hydrogen bond 
interactions are shown as dashed black lines. Mutagenesis studies showed 
that replacement of D103 or D97 in FOLR-a or FOLR-8, respectively, 
which form strong interactions to the N1 and N2 nitrogen atoms of the 
pterin moiety, results in a decrease in affinity of more than one order of 
magnitude’». Six folate-binding residues observed in FOLR-« and FOLR-8 
(FOLR-a/FOLR-B: D103/D97, W124/W118, R125/R119, V129/F123, 


H157/H151, and K158/R152) are not conserved in JUNO. Four of these 
residues (FOLR-a/FOLR-8: D103/D97, W124/W118, R125/R119, and 
H157/H151) form key hydrogen bonds to anchor folate in the active site. 
In JUNO, the substituted residues are not able to maintain the extensive 
hydrogen bond network seen in FOLR-a and FOLR-( to folate. 

c, H. sapiens FOLR-a (Uniprot: P15328), FOLR-6 (Uniprot: P14207), 
FOLR-7 (Uniprot: P41439) and FOLR-6 (Uniprot: A6ND01) are aligned. 
Red boxes indicate complete conservation of a given amino acid. N-linked 
glycosylation sequons (N-X-S/T) are indicated by red-coloured Y-shaped 
symbols. JUNO is anchored to the plasma membrane through a GPI 
anchor at Ser228 (shown as a green lollipop). Experimentally determined 
secondary structural elements are shown as arrows for $-strands and coils 
for a-helices. Key folate-binding residues, identified from the FOLR-a and 
FOLR-( crystal structures, are identified with an asterisk underneath the 
sequence. Key residue differences between JUNO, FOLR-a and FOLR-$ 
folate binding sites are highlighted in a blue box. 
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Extended Data Figure 5 | IZUMO1-JUNO interface. a, 2D schematic of 
the interactions between IZUMO1,>_754 and JUNOn9_22g. Residues from 
the IZUMO1 4HB, hinge, and Ig-like regions and from JUNO are coloured 
orange, green, blue and purple, respectively. Hydrogen-bond interactions 
are shown as dashed lines, and van der Waals forces are depicted as grey 
semi-circles. b, Footprints of JUNO on the surface of IZUMO1 and of 
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162 163 


-_ 
K161 ~~ 
NE oo. 


IZUMO1 


IZUMO1 on the surface of JUNO. The molecular surfaces of IZUMO1 
and JUNO are coloured white with residues forming interactions coloured 
as in a. No N-linked glycans on either IZUMO 122-254 or JUNO 29-228 are 
involved in binding. Formation of this interface results in a calculated free 
energy gain of —10.4kcalmol7!. 
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Extended Data Figure 6 | See next page for caption. 
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Extended Data Figure 6 | Hybrid structural analysis of human IZUMO1 
and JUNO in a solution state. a—c, Ab initio SAXS reconstruction, 
experimental scattering curves, normalized pair distance distribution 
function, P(r) and Kratky plot showing the degree of flexibility of 
IZUMO199_954 (a), JUNO30-228 (b), and the ILUMO19_254-JUNO29-228 
complex (c). No concentration-dependent or radiation effects were 
observed in the SAXS data. The inset box in the experimental scattering 
data shows linearity in the Guinier plot at low q (qRg < 1.3). The 
IZUMO1 2-254 JUNO 0-228 and IZUMO192_254-JUNO29_228 complex 
crystal structures were docked into the SAXS reconstructed molecular 
envelopes. The boomerang shape and upright conformation seen in the 


LETTER 


crystal structures of unbound and bound IZUMO15)_54, respectively, 
were recapitulated by the SAXS reconstructions. d, Summary of the 
experimentally derived SAXS parameters for IZUMO 199-54, JUNO29-228 
and IZUMO129_254-JUNO20-228. The program SCATTER” was used to 
calculate the radius of gyration (Rg) and maximum linear dimension 
(Dmax), and to perform Porod—Debye analysis to obtain the Porod 
volume and P coefficient. e, f, Comparative deuterium exchange mass 
spectrometry (DXMS) profile of unbound and bound IZUMO1 9-254 (e) 
and JUNO 9-228 (f). The plots reveal the change in individual deuterium 
exchange for all observable residues. The coloured lines above the residue 
numbers correspond to the observed regions in the crystal structures. 
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at low q (qRg < 1:3); The IZUMO12_254(WT)-JUNO29-228(WT) complex 
crystal structure was docked into the SAXS reconstructed molecular 
envelopes. e, Summary of the experimentally derived SAXS parameters 
for the various IZUMO1-JUNO complexes. The program SCATTER” 
was used to calculate the radius of gyration (Rg) and maximum linear 
dimension (Dax), and to perform Porod—Debye analysis to obtain the 
Porod volume and P coefficient. 


LETTER 


a Class | viral glycoprotein 


fusion peptide 


“0 100 200 300 400 500 600 700 800 900 0 100 200 


residue count 


HIV-1 gp160 


F522 


viral membrane 


Class Il viral glycoprotein 


a) 100 200 300 400 500 600 
residue count 


Dengue virus type 2 E 


F108 Lee 


viral membrane 


Extended Data Figure 8 | Comparison of IZUMO1 with selected viral 
fusogens. A common feature of many viral fusogens is the presence 

of a hydrophobic fusion peptide or fusion loop. a, Kyte and Doolittle 
hydropathy plots were calculated for IZUMO1, HIV-1 gp160, influenza 
A virus HA, Ebola virus glycoprotein (GP), Dengue virus type 2 E, 

and herpes simplex virus-1 gB to detect the presence of hydrophobic 
regions. Class I and class II viral fusion glycoproteins contain three clear 
hydrophobic regions corresponding to the signal peptide (grey), fusion 
peptide or loop (red) and the transmembrane anchor (blue). For class HI 
viral glycoproteins, the presence of a signal peptide and transmembrane 
anchor are clear, but the hydrophobic fusion loop is formed by two 
discontinuous regions. This results in a lower hydropathy scale that is 
more difficult to detect. Two regions of hydrophobic residues cluster at the 
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tip of the glycoprotein (shown in red) and are thought to be the internal 
fusion loop. In all class I, II and III viral fusion glycoproteins, clustering 
of aromatic and hydrophobic residues in a loop or helical region is a 
hallmark feature of fusion proteins. In contrast, IZUMO1 clearly does not 
have any hydrophobic regions or structural features similar to the viral 
fusogens that could insert into the egg membrane. b, Molecular surface 
representation of class I, II, and III viral glycoproteins and IZUMO1. The 
fusion peptide or loop is shown as red sticks and also coloured red on the 
glycoprotein surface. For the class I viral glycoproteins, the metastable 
prefusion trimer is shown, with the receptor binding and fusion subunits 
shown in blue and green, respectively. For the class II and class III viral 
glycoproteins, the postfusion trimer is shown with three hydrophobic 
fusion loops clustered at the tip of the molecule. 
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Extended Data Figure 9 | Model of IZUMO1 and JUNO in sperm-egg 
fertilization. During fertilization, mature sperm undergoes an acrosome 
reaction and penetrates through the egg zona pellucida to reach the 
perivitelline space. The acrosome reaction also causes relocalization 

of IZUMO1 to the sperm equatorial segment. a, IZUMO1 adopts a 
monomeric boomerang conformation on the surface of the sperm 
membrane. b, Upon binding to the JUNO egg receptor, IZUMO1 
undergoes a conformational change. The 4HB region migrates towards 
the egg membrane. Moreover, the hinge region of ILUMO1 becomes more 
rigid and ‘locks’ the molecule into an upright position. The formation of 
the IZUMO1 and JUNO complex provides a direct physical link between 
the egg and sperm membranes. It is currently not clear whether IZUMO1 
requires a post-JUNO binding event to trigger the fusion process, but 

at least three potential mechanisms are possible. c, The heterotypic 
assembly of IZUMO1 and JUNO, or a secondary conformational 

change in IZUMO1, may bring the egg and sperm membranes into 

close proximity for fusion to take place. d, Inoue et al. proposed that 
subsequent to ILUMO1-JUNO binding, a protein disulfide isomerase 
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(PDI) catalyses a thio-disulfide exchange reaction that leads to structural 
conformation change and dimerization of IZUMO1 (ref. 26). The 
IZUMO1 dimer releases JUNO and contacts a yet-to-be-discovered oocyte 
receptor that facilitates membrane fusion. e, Alternatively, IZUMO1 

may act as a scaffold to recruit other sperm or egg protein partners to 
form a multiprotein fusion complex in a manner similar to some viral 
fusogens. f, The merger of the egg and sperm membranes will require 

the apposition of the two bilayers to initiate initial mixing of the outer 
membrane leaflets and formation of a hemifusion stalk. The hemifused 
bilayers open to form the full fusion pore. g, Following fusion, JUNO is 
rapidly shed into extracellular vesicles from the fertilized oocyte. Within 
30-40 min, JUNO is weakly or barely detectable on the membrane surface 
of zona-intact or anaphase II-stage zona-free fertilized oocytes, and 
undetectable at the pronuclear stage®. h, IZUMO1 binds JUNO tightly 
and rapidly (BLI: Kj=59 -£ 1 nM, kg=1.15 x 10°M~'s~!; SPR: 48-4 4nM, 
k,=4.2 x 10°M~'s~1), and once shed, JUNO is able to bind exposed 
IZUMO1 on incoming acrosomal-reacted sperm in the perivitelline space 
to act as a ‘sperm-sink to block polyspermy. 
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Extended Data Table 1 | IZUMO1-JUNO binding interface mutations 
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All experiments were performed with technical triplicates (n= 3), with mean Kg values +s.e.m. 
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Structure of IZDMO1-JUNO reveals sperm-oocyte 
recognition during mammalian fertilization 


Umeharu Ohto!, Hanako Ishida!*, Elena Krayukhina?, Susumu Uchiyama?*, Naokazu Inoue* & Toshiyuki Shimizu! 


Fertilization is a fundamental process in sexual reproduction, 
creating a new individual through the combination of male and 
female gametes'*. The IZUMO1 sperm membrane protein’ and its 
counterpart oocyte receptor JUNO have been identified as essential 
factors for sperm-oocyte interaction and fusion. However, the 
mechanism underlying their specific recognition remains poorly 
defined. Here, we show the crystal structures of human IZUMO1, 
JUNO and the IZUMO1-JUNO complex, establishing the structural 
basis for the IZUMO1-JUNO-mediated sperm-oocyte interaction. 
IZUMOI exhibits an elongated rod-shaped structure comprised 
of a helical bundle IZUMO domain and an immunoglobulin-like 
domain that are each firmly anchored to an intervening 3-hairpin 
region through conserved disulfide bonds. The central 3-hairpin 
region of IZUMO1 provides the main platform for JUNO binding, 
while the surface located behind the putative JUNO ligand 
binding pocket is involved in IZUMO1 binding. Structure-based 
mutagenesis analysis confirms the biological importance of the 


IZUMO1-JUNO interaction. This structure provides a major step 
towards elucidating an essential phase of fertilization and it will 
contribute to the development of new therapeutic interventions 
for fertility, such as contraceptive agents. 

Given that only one of many ejaculated spermatozoa (100- 
300 million in humans) will fertilize an oocyte, several tightly regulated 
molecular mechanisms must be integrated into the process of fertiliza- 
tion!“ As the culmination of fertilization, gamete membrane fusion in 
particular necessitates extremely robust regulation. IZUMO1 is a type 
I transmembrane protein comprised of the IZUMO domain’, which 
contains «-helices and has been shown to be important for the sperm’s 
oocyte adhesion capability’, an immunoglobulin-like domain in the 
extracellular region, and a short cytoplasmic tail. In contrast, JUNO 
is a glycosylphosphatidylinositol-anchored folate receptor (FR) family 
protein that lacks the ability to carry folic acid® (Extended Data Figs 1 
and 2). IZUMO1 and JUNO are ideal targets for contraceptive agents 
because of their crucial involvement in fertilization. 
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Figure 1 | IZUMO1-JUNO interaction. a, Gel-filtration chromatography 

analysis. Gel-filtration chromatograms (top) and SDS-PAGE analysis 

stained with Coomassie blue (bottom). For gel source data, see 

Supplementary Fig. 1. b, SV-AUC analysis. The normalized c(s) 

distributions were plotted against the sedimentation coefficients 529 

(S). S, Svedberg (unit of sedimentation coefficient). Estimated molecular 
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weights are indicated. The observed sedimentation coefficient values for 
IZUMO1 (2.5 S), JUNO (2.6 S), and the IZUMO1-JUNO complex (3.8 S) 
corresponded well to the values calculated using the three-dimensional 
coordinates of the IZUMO1 monomer (2.5 S), JUNO monomer (2.6 S), and 
1:1 IZUMO1-JUNO complex (3.7 S), respectively. c, Isothermal titration 
calorimetry analysis. All experiments were conducted at neutral pH. 
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Both the extracellular regions of human IZUMO1 and JUNO alone 
expressed in S2 cells were monomeric in solution, as confirmed by 
size-exclusion chromatography and sedimentation velocity analytical 
ultracentrifugation (SV-AUC) analyses (Fig. 1a, b). Isothermal titra- 
tion calorimetry analysis showed that IZUMO1 bound JUNO with a Ka 
value of 91nM and a stoichiometry (N) of approximately 1.0 at pH 7.5 
(Fig. 1c, Extended Data Table 1), suggesting a 1:1 binding mode. 
The estimated M,, of the ILUMO1-JUNO complex (s29,w = 3.8 S) 
corresponded to a 1:1 complex (Fig. 1b). The IZUMO1-JUNO inter- 
action was not affected by deglycosylation (Extended Data Table 1). 

We determined the crystal structures of human IZUMO1, JUNO 
(form 1 and form 2), and the IZUMO1-JUNO complex (form 1, form 
2 and form 3) (Extended Data Tables 2 and 3). IZUMO1 exhibits an 
elongated rod-shaped structure approximately 90 A in length, com- 
prised of the N-terminal o-helical IZUMO domain (residues 22-134), 
a central 8-hairpin region (residues 135-163), and the C-terminal 
immunoglobulin-like domain (residues 164-255) (Fig. 2a, Extended 
Data Fig. 1). The IZUMO domain forms a four-helix bundle (a1-a4) 
possessing up-down-up-down topology, in which pairwise a-helices 
(al-a2 and o3-a4) are arranged in an anti-parallel mode (Fig. 2a). 
IZUMOI1 contains ten cysteines, which are conserved among mam- 
malian species (Extended Data Fig. 1), that are all involved in disulfide 
bond formation (Fig. 2a). The positions of the disulfide bonds were 
confirmed in the anomalous difference Fourier maps based on 
diffraction data collected at a 2.7 A wavelength (Extended Data Fig. 3). 
The flanking region of the a1 helix (C22-XX-C25) is tethered to the 
loop (C149-XX-C152) of the central 6-hairpin region by the disulfide 
bonds between C22-C149 and C25-C152 (Fig. 2a); the C-XX-C motifs 
in both regions are conserved among species (Extended Data Fig. 1). 
Similarly, the C139-C165 disulfide bond stabilizes the interaction 
between the central $-hairpin region ($1 and 82) and the immu- 
noglobulin-like domain (Fig. 2a). The remaining disulfide bonds 
(C135-C159 and C182-C233) stabilize the conformation of the cen- 
tral region and immunoglobulin-like domain, respectively (Fig. 2a). 
These disulfide bond-mediated interactions with the central }-hairpin 
region restrict the relative orientation of the three regions despite the 
absence of direct interactions between the IZUMO and immunoglobulin- 
like domains (Extended Data Fig. 4a). 

The structure of human JUNO is similar to that of related FRs (Fig. 2b). 
Eight a-helices (a1-a8) and four B-strands (81-684) in JUNO form 
a single globular fold, which is stabilized by the eight disulfide bonds 
conserved among FRs!°"!, riboflavin-binding protein’, and JUNO. 
Although a number of the loop regions (LO, L1 and L3 regions) are disor- 
dered in the mouse JUNO structure’, these same loop regions in human 
JUNO form ordered structures in all molecules in the asymmetric 
unit (Extended Data Fig. 4b). The human JUNO L2 region, which 
corresponds to the inhibitory loop in FRs!!, is disordered (Fig. 2b), 
possibly due to a 6-7 amino acid insertion not present in other species 
(Extended Data Figs 1 and 2). Because of the high sequence identity 
with FRs (~60%), JUNO is also thought to have a hydrophobic pocket 
that corresponds to the folate binding pocket in FRs (Fig. 2b, Extended 
Data Fig. 2). The JUNO pocket has a volume of 450 A? and is delimited 
by six structural segments (LO-L4 regions and a3 helix). Although 
most of the hydrophobic residues forming the folate binding pocket are 
conserved in JUNO (Fig. 2b, Extended Data Fig. 2), it does not exhibit 
any affinity for folate®. In addition to the hydrophobic interactions, FR8 
forms multiple polar interactions with the folate pterin moiety via D97, 
R119, H151, and S190 (Fig. 2b), which have been shown to be impor- 
tant for folate binding in FRs’®!. These residues are not conserved 
in JUNO except for S190; corresponding residues for JUNO are A93, 
Q122, R154, and S193, respectively (Fig. 2b, Extended Data Fig. 2). 
Moreover, the side chain of W190 in JUNO exhibits a rotamer different 
from that in FRs (W187 for FR3) (Fig. 2b), resulting in narrowing of the 
pocket. Therefore, JUNO would be unable to bind folate. 

IZUMO1-JUNO exists as a 1:1 complex in crystals (Fig. 3a). The 
three crystal forms of the ILUMO1-JUNO complex revealed essentially 
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Figure 2 | Structure of IZUMO1 and JUNO. a, Structure of human 
IZUMOl1. The Cys positions and disulfide bonds are indicated. The 
disulfide bonds and N-acetylglucosamine (NAG) residue are shown as 
stick representations. An NAG residue attached to N204 is located at 

the tip of the 85-(6 hairpin region that turns outward from the (-sheet 
consisting of 83, (35, 36, 38, and 39, which form the bottom of IZUMO1. 
N204 is conserved among species and is important for protecting IZUMO1 
from fragmentation in the cauda epididymis’®. Ig, immunoglobulin. 

b, Structure of human JUNO. Top, structure of human JUNO (left) and 
the FR6-folate complex (PDB ID, 4KMZ)"° (right). The glycan attached to 
N73, which is important for JUNO secretion”, is located at the limb of the 
pocket (L1 region). Bottom, magnified views of the JUNO hydrophobic 
pocket (left) and FRG folate binding pocket (right). The hydrogen bonds 
are depicted with dashed lines. 


the same architectures of the complex (Extended Data Fig. 4c), 
despite the differences in crystallization conditions and protein sam- 
ples in terms of glycosylation and reductive alkylations of lysine resi- 
dues (Extended Data Tables 2 and 3). The structures of IZUMO1 and 
JUNO are not substantially altered upon their forming the complex 
(Extended Data Fig. 4c). The IZUMO1-JUNO interface is composed 
of the IZUMOI1 regions of the a2-a3 loop (IZUMO domain), the 
central $-hairpin region (81 and 82), and the 88-89 loop (immuno- 
globulin-like domain) and the JUNO flanking region of the a1 
helix, the «2 helix, and the L1 region, as well as the «3 helix, and the 
N-terminal side of L3 region (JUNO) (Fig. 3a, b, Extended Data Fig. 1). 
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Figure 3 | Structure of the IZUMO1-JUNO complex. a, Structure of 
the human IZUMO1-JUNO complex. b, Schematic summary of 
IZUMO1-JUNDO interactions. Red and grey lines depict hydrogen bonds 
and van der Waals contacts, respectively. c, Surface complementarity 

of the IZUMO1-JUNO interface. JUNO (left) and IZUMO1 (right) are 
shown in semi-transparent surface representation. d, Hydrogen bond 
interaction between IZUMO1 and JUNO. e, f, Trp-mediated interactions 


The central B-hairpin region of IZUMO1 acts as the main platform for 
the JUNO-binding surface located behind the putative ligand-binding 
pocket (Fig. 3a, b). IZUMO1 and JUNO utilize their complementary 
surface for association, resulting in a contact area of 842 A? (crystal 
form 1) (Fig. 3c). Hydrophobic and van der Waals interactions are the 
main contributors to the binding; in addition, six hydrogen bonds also 
contribute to the binding (Fig. 3d). Interactions through the two Trp 
residues present on each protein with good surface complementarity 
can be defined: the side chain of IZUMO1 W148 interacts with L81, 
L82, M83, and P84 of JUNO (Fig. 3c left, e); and JUNO Wé2 inter- 
acts with R160, K161, $162, and Y163 of IZUMO1 (Fig. 3e, right, f). 
These Trp residues are conserved among all species (Extended Data 
Fig. 1). The L1 and L3 regions of JUNO have been shown to be impor- 
tant for interaction with IZUMO1’. The conserved G80-L81 motif 
in the L1 region undergoes a conformational change upon IZUMO1 
binding (Extended Data Figs 1 and 4c), thus creating extensive con- 
tacts with M75, V77, Y134, W148, and H157 of IZUMO1 (Fig. 3c, 
right, e). Accordingly, mutant proteins of the interface residues, espe- 
cially W148A of IZUMO1 and W62A and L81A of JUNO, exhibited 
reduced affinity (Fig. 1c, Extended Data Table 1). 

Together with our paper, another group has also reported essentially 
the same results'*, Their work further confirms the overall architecture 
of the IZUMO1-JUNO complex and key residues of the interaction 
by X-ray crystallography along with techniques complementary to 
ours, including small angle X-ray scattering, deuterium exchange mass 
spectrometry, biolayer interferometry, and surface plasmon resonance 
analyses. 

To confirm the functional relevance of the IZUMO1-JUNO inter- 
face in sperm-oocyte interactions, we conducted cell-oocyte binding 
assays using COS-7 cells expressing wild-type or mutant mouse 
Izumo1 containing single or multiple mutations in the binding 
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between IZUMO1 and JUNO, focusing on W148 of IZUMO1 (e) and 
Wé62 of JUNO (f). g, The interface of IZUMO1 (left) and JUNO (right). 
The interface residues that are conserved between human and mouse are 
coloured in red (left) and orange (right), and non-conserved residues in 
yellow (left) and magenta (right). The residues of mouse Izumol and Juno 
are shown in parenthesis. 


interface for JUNO (W148A, K154A, H157A, I158R, R160A, and 
L163A) (Fig. 3b). The cell surface distribution of the proteins was 
confirmed by immunostaining (Extended Data Fig. 5). The average 
number of attached cells was significantly reduced by introduction of 
single mutation in Izumo1 (Fig. 4). In particular, mutations in con- 
served amino acid residues (W148A, H157A, and R160A) exhibited 
the greatest impact on the oocyte binding consistent with the Kg values 
of the IZUMO1 W148A mutant (Fig. 1c) and structural observations 
(Fig. 3). In multiple mutations, COS-7 cells could no longer bind to 
egg surface, thus demonstrating the functional importance of the 
IZUMO1-JUNO interface. 

The IZUMO1-JUNO interaction has a certain degree of species 
specificity. For example, human IZUMOI can interact with ham- 
ster Juno but not with mouse Juno®!4. The IZLUMO1-JUNO inter- 
face includes both conserved and non-conserved residues (Fig. 3g, 
Extended Data Fig. 1). While the conserved structural features such as 
the Trp-mediated interactions described above ensure the conserved 
binding mode, the variable regions in the interface determine the spe- 
cies specificity of the IZUMO1-JUNO interaction. 

It has been hypothesized that after the initial IZUMO1-JUNO 
binding, IZUMO1 forms a closed dimer and simultaneously releases 
JUNO”. Although IZUMO1 tends to dimerize at high concentrations, 
no further oligomerization of the IZUMO1-JUNO complex in solu- 
tion was detected by SV-AUC analysis, even at high concentrations 
(~100 1M) (Extended Data Fig. 6). Thus, the current IZUMO1-JUNO 
structure represents the initial gamete recognition state and further 
structural conversion of the complex would occur during fertiliza- 
tion. The conformation of the central region of IZUMO1 important 
for JUNO binding is physically restricted by neighbouring domains 
through partially exposed disulfide bonds (Fig. 2a). Accordingly, 
IZUMO1 exhibited high sensitivity to reducing agents and easily lost 
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Figure 4 | IZUMO1 containing interface mutations abolishes cell- 
oocyte binding. Cell-oocyte assays to evaluate mouse IZLUMO1-JUNO 
interactions were performed using transfected COS-7 cells. Bright-field 
images taken 1h post incubation. Scale bars, 100 j1m. Attached cell count 
values (excluding aggregated cells) from three independent experiments 
are presented (bottom). The red line indicates the average. *P < 0.0001 
between wild-type IZUMO1 and its mutants; Student’s t-test. Sample 
size (no. of oocytes), average and error bars are discussed further in the 
Methods. 


its native conformation, while JUNO did not (Extended Data Fig. 7a). 
Interestingly, the IZUMO1-JUNO interaction was weaker at acidic pH 
than at neutral pH (Extended Data Fig. 7b, Extended Data Table 1). 
As FRs undergo pH-dependent conformational change!°, JUNO may 
also exhibit a distinct conformation at low pH that is incapable of 
binding to IZUMOI1. These properties of IZUMO1 and JUNO may 
be involved in the regulation of the IZUMO1-JUNO interaction. 
Alternatively, unidentified factor(s) on the oocyte could be involved in 
the subsequent fertilization steps. Additional studies will be required 
to elucidate the processes following the initial encounter of IZUMO1 
and JUNO. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized and investigators were not blinded to allocation during 
experiments and outcome assessment. 

Protein expression, purification and crystallization. The DNA encoding 
human IZUMOI (residues 22-255, Uniprot accession number Q8IYV9) and 
human JUNO (residues 20-228, Uniprot accession number A6ND01) with a 
C-terminal PreScission protease cleavage site followed by Protein A tag were 
inserted into the expression vector pMT/BiP/V5-His of Drosophila Expression 
System. Drosophila S2 cells were co-transfected with the IZUMO1 or JUNO, and 
pCoHygro vectors. Stably transfected cells were selected in Sf-900 II SFM medium 
containing 300 1g ml! hygromycin. Protein expression was induced by adding 
0.5mM CuSO, in Express Five SFM medium. Culture supernatant was harvested 
at 160-240h after induction. Protein was captured by IgG Sepharose 6 Fast Flow 
(GE Healthcare) equilibrated with phosphate buffered saline (PBS), washed with 
ten column volumes of PBS, and eluted by 0.1 M glycine-HCl pH 3.5 and 0.15 M 
NaCl. Eluent was immediately neutralized by adding with 1/20 volume of 1 M 
Tris-HCl pH 8.0 and was concentrated to 1-3 mgml~!. For JUNO, protein was 
purified by HiTrap Q (GE Healthcare) anion exchange chromatography at pH 8.5. 
Bound protein was eluted with a linear NaCl gradient (0 to 0.6 M). Protein was 
incubated with 1/20 (w/w) PreScission protease for 3h at 277 K to cleave Protein 
A tag and further purified by Superdex 200 gel filtration column (GE healthcare) 
equilibrated with 10 mM Tris-HCl pH 7.5 and 0.15 M NaCl. For the preparations 
of deglycosylated samples of IZUMO1 and JUNO, the culture medium was supple- 
mented with kifunensine (1.8 1g ml!) to produce protein with endoglycosidase- 
susceptible N-glycans and the purification steps included deglycosylation. After 
tag cleavage, protein was added with 1/10 volume of 1 M MES-NaOH pH 6.5 
and 1-2 U endo Hf (New England Biolabs) per mg of protein and incubated for 
3h at room temperature. The crystallization samples with reductive alkylation of 
lysine residues were prepared using Reductive Alkylation Kit (Hampton Research) 
according to the manufacturer's protocol. 

Crystallization experiments were performed with sitting-drop vapour- 
diffusion methods at 293 K. The crystallization droplets were made by mixing the 
equivolume of protein solution and reservoir solution, typically around 0.2-1.0 1. 
The crystallization conditions are summarized in Extended Data Table 2. The Os 
derivative of IZUMO1 crystals were prepared by soaking the IZUMOI crystals 
into a solution containing 10 mM K,OsO, in mother liquor (12% (w/v) PEG6000, 
0.1 M sodium citrate pH 5.6, 90 mM NaCl) for 1h. The Pt derivative of ILUMO1 
crystals were prepared by soaking the IZUMO1 crystals into a solution containing 
10mM K,PtCl, in the mother liquor for 1 minute. 

Data collection and structure determination. Diffraction data sets were 
collected on beamlines PF-1A and PF-AR NE3A (Ibaraki, Japan) under cryo- 
genic conditions at 100 K. Crystals were soaked into cryoprotectant solution 
(Extended Data Table 2) and then flash-cooled under a cold gas stream. The 
diffraction data sets were processed using the HKL2000 package!” or XDS?°. 
Phasing and initial model building of the IZUMO1 crystal structure were per- 
formed using autoSHARP”” and ARP/wARP”, respectively, followed by iterative 
cycles of manual model building using COOT program”! and restrained refine- 
ment using REFMAC” until the R factor was converged. The initial models for 
JUNO structure (form 1) was obtained by the molecular replacement method 
using the MOLREP”’ program using the coordinates of FR3 (PDB ID: 4KMY) 
and was further refined similarly to the IZUMO1 structure. The initial models 
for the remaining structures were obtained by the molecular replacement method 
using the refined coordinates of IZUMO1 and JUNO (form 1). The quality of 
the final structures was evaluated with MolProbity™*. The statistics of the data 
collection and refinement are summarized in Extended Data Table 3. Figures 
were prepared with PYMOL”. The pocket volume of JUNO was calculated using 
CASTp**. 

Isothermal titration calorimetry. ITC experiments were performed at 298 K 
using MicroCal iTC200 (GE Healthcare) in a buffer composed of 10 mM Tris- 
HCI pH 7.5 and 0.15 M NaCl (Fig. 1c) or 10mM MES-NaOH pH 5.5 and 0.15M 
NaCl (Extended Data Fig. 7b). JUNO at a concentration of 100|1M was titrated 
into 101M of IZUMO1. The titration sequence included a single 0.411 injection 
followed by 19 injections of 2 11 each, with a spacing of 120s between injections. 
OriginLab software (GE Healthcare) was used to analyse the raw ITC data. 
Thermodynamic parameters were extracted from curve fitting analysis with a 
single-site binding model. 

Sedimentation velocity analytical ultracentrifugation (SV-AUC). SV-AUC 
experiments were performed at 20°C in a ProteomeLab XL-I analytical ultra- 
centrifuge (Beckman Coulter) at 42,000 r.p.m. using absorbance detection. The 
collected data were analysed using continuous c(s) distribution of SEDFIT?”8 
fitting for the frictional ratio, meniscus, and time-invariant noise and using 


regularization level of 0.68. The buffer density and viscosity and the partial 
specific volumes of IZUMO]1 and JUNO were calculated using the program 
SEDNTERP 1.09 and were 1.00852 gml! and 1.0256 cP, 0.7217 cm*g', and 
0.7349 cm? g"!, respectively. The partial specific volume of IZUMO1-JUNO 
complex was estimated using the program UltraScan-SOMO” and was 
0.728cm3 gl. 

To study the complex formation between IZUMO1 and JUNO, experiments 
were conducted with 20M of each protein individually and as a mixture 
(Fig. 1b). The concentration-dependent dimerization of IZUMO1 was revealed 
through measurements performed at protein concentrations of 10, 20 and 100,1M 
(Extended Data Fig. 6a). The absence of concentration dependence of IZUMO1- 
JUNO complex formation was demonstrated using mixtures containing 10, 20 and 
100|1M of each respective protein (Extended Data Fig. 6b). In addition, SV-AUC 
experiments were conducted for the mixture of deglycosylated proteins (Extended 
Data Fig. 6b). Theoretical values of sedimentation coefficients of IZUMO1, JUNO 
and the IZUMO1-JUNO complex were calculated from the three-dimensional 
structures using the program UltraScan-SOMO”. 

Mice, antibodies and cultured cells. Eight- to twelve-week-old B6D2F1 
(a cross between female C57BL/6 and male DBA/2) female mice were purchased 
from Japan SLC, Inc. All animal experiments were approved by the Animal Care 
and Use Committee of Fukushima Medical University, Japan. Anti-mouse 
IZUMO1 monoclonal antibody (Mab18), generated as described!°, was con- 
jugated with Alexa Fluor 488 using an antibody labelling kit (Thermo Fisher 
Scientific). COS-7 cells (African green monkey kidney fibroblast-like cell 
line) were obtained from RIKEN BRC. This cell line has been authenticated 
mycoplasma free by PCR. 

Cell-oocyte assay. Mouse Izumo1 and its mutants (W148A, K154A, H157A, 
1158R, R160A, L163A, W148A/R160A, and the 6 mutant (W148A/K154A/ 
H157A/1158R/R160A/L163A)), which were created using KOD-plus-neo 
Mutagenesis (Toyobo), cDNAs were ligated into the mammalian expression 
vector pCXN-2. These constructs were verified by DNA sequencing. COS-7 cells 
were transiently transfected with these constructs using polyethylenimine (PEI) 
methods”. To verify cell surface expression of IZUMO1 proteins (Extended 
Data Fig. 5), IZUMO1 was detected with a mouse IZUMO1-specific monoclo- 
nal antibody, Mab18 conjugated with Alexa Fluor 488 (green) without plasma 
membrane permeabilization treatments. The final concentration of antibodies 
added was 0.5 1g ml~!. Nuclei were stained with Hoechst 33342. After two days, 
transfected COS-7 cells were collected with 5mM EDTA-PBS, washed three 
times with PBS, and suspended in TYH medium (LSI Medience). For prepara- 
tion of zona-free eggs, B6D2F1 female mice (>8 weeks old) were superovulated 
with an injection of 7.5 IU of human chorionic gonadotropin (hCG) 48h after a 
7.5-IU injection of equine chorionic gonadotropin (eCG). The eggs were col- 
lected from the oviduct 16h after the hCG injection. Eggs were placed in a 
200-11 drop of TYH medium. The zona pellucida was removed from eggs by 
treatment with 1.0 mg ml! of collagenase (Wako). Zona-free eggs were incu- 
bated with transfected COS-7 cells at 37°C in TYH medium for 1h and the 
attached cells were counted under an inverted microscope after briefly wash- 
ing by pipetting (68 oocytes for wild-type mouse IZUMOl] (average attached 
cells: 13.99 + 0.32), 82 oocytes for W148A (average attached cells: 0.12 + 0.04), 
54 oocytes for K154A (average attached cells: 6.0 + 0.27), 72 oocytes for H157A 
(average attached cells: 0.38 + 0.08), 69 oocytes for 1158R (average attached 
cells: 1.78 + 0.21), 81 oocytes for R160A (average attached cells: 0.82 + 0.13), 
54 oocytes for L163A (average attached cells: 6.81 + 0.28), 84 oocytes for 
W148A/RI60A (average attached cells: 0.1 + 0.04), and 71 oocytes for W148A/ 
K154A/H157A/I158R/R160A/L163A (average attached cells: 0); three inde- 
pendent experiments). 
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Macaca mulatta SPNLGPWIRPVGSLGWE------ GERVVNAPLCQEDCEEWWEDCRLSYTCKSNWRGGWDWSQGKNRC PKGAQCLPFSHYFPTPADLCEKTWSNSFKASPE 
Equus caballus SPNLGPWIQQVDLSGQG------- ERILDAPLCREDCEQWWEDCRISYTCKSNWHGGWDWSGGKNRC PARARCHPF PHY FPTPADLCERIWSNSFKASPE 
Bos mutus SPNLGPWIQQVDPRWQA------- ERVLDAPLCLEDCERWWADCRTSHTCKSNWLGGWAWSRGKPRC PEWEPCRPFPHHFPTPADLCERIWSGSFRASPE 
Oryctolagus cuniculus SPNLGPWIQPVDPSGPE------- QRAMDVPLCHEDCEQWWEDCRTSYTCKSNWHGGWDWSRGRNRC PAEAPCRPFPHYFPTPADLCEKIWNNTFKASPE 
Mus musculus SPNLGPWIOPVVPNGQE------ EQRVWGVPLCOQEDCEDWWRACHS SLTCKSNWLHGWDWSEEKKHCPAHEPCLPFSYHFPTPDDLCEKIWNNTFKASPE 
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Extended Data Figure 1 | Sequence alignments of IZUMO1 and 
JUNO. Sequence alignments of IZUMO1 (top) and JUNO (bottom) 

from Homo sapiens, Macaca mulatta, Equus caballus, Bos tauras or Bos 
mutus, Oryctolagus cuniculus and Mus musculus are shown. Secondary 
structure elements are displayed above the sequences. The residues of 
the IZUMO1-JUNO interface are indicated by yellow highlighting. The 
sequence identity between human and mouse IZUMO1 and JUNO are 
59% and 69%, respectively, for the extracellular regions used in this study. 


250 


RRNSGRCLOKWFEPAQGNPNVAVARLFASSAPSWELSYT IMVCSLFLPFLS 
RRNSGOCLOKWFE PAQGNPNVAVARLFASSAPSWELSYTLMVCSVFLPFLS 
HRNSGRCLOKWFEPAQGNPNEAVARLFASPAWSWEFSHTLMAFSLFLSCLS 
RRGSGQCLOKWFE PARGNPNAEVARRFASPARSWARCPGLLAFPLLLPLLS 
HOGSGRCLOKWFEPAQGNPNVAVARLFASPAPAWKLPSTLVGFSLFLPFLP 
RRNSGRCLOKWFEPTLSNPNVEVALHFAGSALAPOLSYTLPAFSLCLLFHP 


In contrast, the interface residues exhibit less conservation (35% and 53% 
identity for IZUMO1 and JUNO, respectively). The N-glycosylation sites 
are indicated by blue Y-shaped characters. Alignments were performed 
using Clustal Omega software (EMBL-European Bioinformatics Institute). 
Residues are coloured to indicate the degree of similarity: red residues are 
those with the highest similarity, followed by green, blue, and black (lowest 
similarity). 
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JUNO_Mus musculus IWNNTFKASPERRNSGRCLOKWFEPTLSNPNVEVALHFAGSALAPQLSYTLPAFSLCLLFHP-------- 

FRo_Homo sapiens IWTHSYKVSNY SRGSGRC I QMWF DPAQGNPNEEVARF YAAAMSGAG- --PWAAWPFLLSLALML-LWLLS 

FRy_Homo sapiens ; tBbevsnrsresencromrsagoneveevanevaawocn maces PSRGIIDS----------- 

FR®_Homo sapiens SYKVSNYSRGSGRC I QMWFDSAQGNPNEEVARF YAAAMHVNAGEMLHGTGGLLLSLALMLOLWLLG 

190 200 210 220 230 240 250 

Extended Data Figure 2 | Sequence alignment of JUNO and folate folate binding in FRs are indicated by yellow highlighting. Alignments 
receptor (FR). Sequence alignment of JUNO from Homo sapiens, Equus were performed using Clustal Omega software (EMBL - European 
caballus and Mus musculus and FRa, FR, and FRy from Homo sapiens. Bioinformatics Institute). Residues are coloured to indicate the degree of 
Secondary structure elements for JUNO and FR are displayed above similarity: red residues are those with the highest similarity, followed by 
and below the sequences, respectively. The residues involved in the green, blue, and black (lowest similarity). 
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Extended Data Figure 3 | Anomalous difference Fourier maps from 
S-SAD data. The anomalous difference Fourier maps from the data 
collected at 2.7 A wavelength contoured with green at the 3c level are 
superposed onto the refined model of the IZUMO1-JUNO complex. 
The disulfide bonds are shown in stick representations and indicated by 
green (IZUMO1) and cyan (JUNO) labels. 
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Extended Data Figure 4 | Superpositions of IZUMO1 and JUNO 
structures. a, Superposition of the six IZUMO1 molecules (chains A-F) in 
the asymmetric unit. b, Superposition of the two (form 1; chains A and B) 
and four (form 2; chains A-D) JUNO molecules in the asymmetric unit. 

c, Superposition of IZUMO1 (chain A) and JUNO (chain A) structures 


onto the IZUMO1-JUNO complex structure (form 1, form 2, and form 3). 
IZUMO1 (chain A) and JUNO (chain A) are coloured grey. The IZUMO1- 
JUNO complexes in the form 1, form 2, and form 3 (chain A-B and chain 
C-D) crystals are coloured green, cyan, orange, and magenta, respectively. 
The r.m.s.d. values for each pair of molecules are shown schematically. 
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6 mutations 


Extended Data Figure 5 | Cell surface expression of IZUMO1 mutants in COS-7 cells (related to Fig. 4). Surface localization of wild-type IZUMO1 or 
its mutant in COS-7 cells. IZUMO1 proteins on the cell surface (green) and nuclei (blue) were stained. Scale bars, 20 1m. 
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Extended Data Figure 6 | Dimerization of IZUMO1 and 
IZUMO1-JUNO complex analysed by SV-AUC and size-exclusion 
chromatography. a, b, The oligomerization states of IZUMO1 (a) and 
IZUMO1-JUNO complex (b) were analysed by SV-AUC at various 
concentrations. The normalized c(s) distributions were plotted against 
the sedimentation coefficients, s39,y (S). The observed sedimentation 
coefficient of the IZUMO1 dimer (3.2 S) at a high concentration 
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(100 1M) in a was smaller than the expected value for IZUMO1 dimer 
(3.7 S), owing to the fast monomer-dimer interconversion kinetics. 

c, The oligomerization of IZUMO1 was analysed by gel-filtration 
chromatography. In each experiment, 20, 100 or 740 1M IZUMO1 (total 
volume of 5011) was injected into a Superdex 200 Increase 5/150 GL 
gel-filtration column (running buffer; 10 mM Tris-HCl pH 7.5 and 
150mM NaCl). 
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Extended Data Figure 7 | Effect of reducing agent and acidic pH on the 
IZUMO1-JUNO interaction. a, DT T-induced aggregation of IZUMOl1. 
Each sample was incubated at room temperature for 2h in the presence of 
1mM DTT, and then injected into a Superdex 200 Increase 5/150 GL gel- 
filtration column (running buffer; 10 mM Tris-HCl pH 7.5 and 150 mM 
NaCl, 1mM DTT). Eluents were analysed by SDS-PAGE. For gel source 
data, see Supplementary Fig. 1. b, The interaction between IZUMO1 and 
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JUNO at acidic pH. The IZUMO1-JUNO interaction at acidic pH was 
analysed by size-exclusion chromatography (SEC; top) and isothermal 
titration calorimetry (bottom) as in Fig. 1. In SEC analysis, each sample 
was injected into a Superdex 200 Increase 5/150 GL gel-filtration column 
(running buffer; 10 mM MES-NaOH pH 5.5 and 150 mM NaCl). Eluents 
were analysed by SDS-PAGE. c, The SEC chromatograms at a neutral pH 
of 7.5 is shown as a control experiment. 
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Extended Data Table 1 | Isothermal titration calorimetry results 


Cell pH Titrant Gand (kcal/mol) (cational) N 
IZUMO!1 (WT) 7.5 JUNO (WT) 91411 -12+0.2 -7.9+0.8 1.0+0.0 
IZUMO1 (WT) 7.5 JUNO (W62A) 360 -7.8 3.2 1.1 
IZUMO1 (WT) 7.5 JUNO (L66A) 130 -9.2 0.7 1.2 
IZUMO1 (WT) 7.5. JUNO (L81A) 3,300 -4.1 11.3 1.07 
IZUMO1 (WT) 7.5 JUNO (M83A) 240 -10.9 -6.3 1.0 
IZUMO1 (W148A) 7.5. JUNO (WT) 3,200 -0.8 22.6 1.07 
IZUMO]1 (V156A) 7.5 JUNO (WT) 89 -14.6 -16.7 0.6 
UMOLWT 5 NOW — sigags 92408 14091400 
IZUMO]1 (WT) 5.5 JUNO (WT) 16,000 -19.9 8.9 1.07 


Results are expressed as means + standard deviation. 
TN was fixed at 1.0. 
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Extended Data Table 2 | Crystallization and cryoprotectant condi 


Protein solution 


Reservoir 


solution 


Cryoprotectant 


solution 


IZUMO1* 


8.0 mg/ml IZUMO1 
10 mM Tris-HCl pH 7.5 
0.15 M NaCl 


12-15% (w/v) PEG6000 
0.1 M sodium citrate pH 5.6 


12% (w/v) PEG6000 

0.1 M sodium citrate pH 5.6 
0.09 M NaCl 

25% ethylene glycol 


JUNO 
Form 1 Form 2* 
20 mg/ml JUNO 8.0 mg/ml JUNO 
10 mM Tris-HCl pH 7.5 10 mM Tris-HCl pH 7.5 
0.15 M NaCl 0.15 M NaCl 


20% (w/v) PEG3350 
0.24 M malonate pH 7.0 


3% (w/v) PEG4000 
5% 2-propanol 
0.1 M HEPES-NaOH pH 7.5 


20% (w/v) PEG3350 
0.24 M malonate pH 7.0 


3% (w/v) PEG4000 
5% 2-propanol 


0.09 M NaCl 0.1 M HEPES-NaOH pH 7.5 
25% ethylene glycol 0.1 M NaCl 
25% glycerol 


Form 1* 


4.1 mg/ml IZUMO1 
4.3 mg/ml JUNO 

10 mM Tris-HCl pH 7.5 
0.15 M NaCl 


8-12% PEG4000 
0.2 M Li2SO4 
0.1 M MES-NaOH pH 6.0 


12% PEG4000 

0.12 M LixSO4 

0.06 M MES-NaOH pH 6.0 
0.1 M NaCl 

25% ethylene glycol 


IZUMO1-JUNO complex 


Form 2¢ 


5.0 mg/ml IZUMO1 
5.0 mg/ml JUNO 

10 mM Tris-HCl pH 7.5 
0.15 M NaCl 


1.0 M (NH,)2SO« 
1.0M KCl 
0.1 M HEPES-NaOH pH 7.0 


1.0 M (NHs)2SO4 

1.0MKCI 

0.1 M HEPES-NaOH pH 7.0 
0.09 M NaCl 

25% ethylene glycol 


Form 3? 


5.0 mg/ml IZUMO1 
5.0 mg/ml JUNO 

10 mM Tris-HCl pH 7.5 
0.15 M NaCl 


20% PEG4000 
0.2 M LizSO4 
0.1 M MES-NaOH pH 6.0 


20% PEG4000 

0.2 M Li2SO4 

0.1 M MES-NaOH pH 6.0 
0.09 M NaCl 

25% ethylene glycol 


*Deglycosylated samples were used for crystallizations. 
tlsopropylated IZUMO1 (undeglycosylated) and methylated JUNO (deglycosylated) were used for crystallization. 


tEthylated IZUMO1 (deglycosylated) and ethylated JUNO (undeglycosylated) were used for crystallization. 
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Extended Data Table 3 | Data collection and refinement statistics 
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IZUMO1* JUNO IZUMO1-JUNO 
Native K20sO4 KoPtCl4 Form 1 Form 2* Form 1* Form 1* Form 2* Form 3? 
(S-SAD) 
Data collection 
Beamline PF-AR NE3A PF-AR NE3A PF-AR NE3A PF-AR NE3A PF-AR NE3A PF-AR NE3A PF-1A PF-AR NE3A PF-AR NE3A 
Wavelength (A) 1.0000 1.1400 1.0723 1.0000 1.0000 1.0000 2.7000 1.0000 1.0000 
Space group Pl Pl Pl 121212) 2 C2221 C2221 @2 P2212 
Cell dimensions 
a,b,c (A) 64.9, 75.1, 108.9 64.8, 75.0, 107.9 64.8, 75.0, 108.3 51.9, 81.0, 235.1 96.8, 88.6, 108.1 65.2, 144,8, 141.9 64.2, 144.5,142.0 145.6, 65.4, 77.2 141.8, 144.9, 64.2 
a, By (°) 77.6, 79.1, 70.7 77.9, 78.9, 71.3 78.0, 79.1, 71.0 90, 96.9, 90 90,104.2, 90 
Resolution (A) 50.0-2.10 50.0-2.50 50.0-2.20 50.0-2.00 44.33.23 50.0-2.90 47.3-3.20 50.0-2.90 48.32.86 
(2.14-2.10)# (2.54-2.50) (2.24-2.20) (2.03-2.00) (3.49-3.23) (2.95-2.90) (3.42-3.20) (2.95-2.90) (3.01-2.86) 
Rsym 0.052 (0.575) 0.067 (0.552) 0.058 (0.383) 0.110 (0.708) 0.152 (0.552) 0.072 (0.721) 0.196 (1.685) 0.084 (0.720) 0.0071 (1.061) 
isl 21.6 (1.7) 18.9 (1.3) 17.8 (2.0) 26.6 (1.9) 8.0 (2.5) 34.5 (2.0) 36.2 (5.5) 21.6 (2.4) 28.5 (2.7) 
Completeness (%) 97.6 (92.0) 97.8 (95.9) 97.7 (96.7) 99.2 (93.8) 99.4 (99.8) 100.0 (100.0) 99.7 (99.1) 99.9 (99.2) 100.0 (100.0) 
Redundancy 3.5 (3.3) 3.5 (3.1) 3.5 (3.3) 12.3 (9.6) 3.4 (3.3) 6.5 (6.5) 196.7 (202.5) 6.6 (5.9) 13.2 (13.7) 
Refinement 
Resolution (A) 50.0-2.10 50.0-2.00 44.33.23 50.0-2.90 50.0-2.90 48.32.86 
No. reflections 101,240 32,072 13,842 14,419 15,006 29,880 
Rwork / Réree 19.6 / 23.1 19.8 /21.8 21.7/24.2 23.0 / 26.3 22.3 /25.2 20.4 / 23.8 
No. atoms 
Protein 11,096 3,228 6394 3,484 3,476 6,986 
Water 537 149 0 5 3 4 
B-factors 
Protein 48.8 44.7 72.6 105.7 88.4 86.9 
Water 42.2 42.1 74.7 42.9 44.8 
R.m.s deviations 
Bond lengths (A) 0.012 0.010 0.010 0.012 0.011 0.011 
Bond angles (°) 1.62 1.49 1.46 1.69 1.60 1.60 


Each data set was collected with one crystal. Highest resolution shell is shown in parentheses. 
*Deglycosylated samples were used for crystallizations. 
Tlsopropylated IZUMO1 (undeglycosylated) and methylated JUNO (deglycosylated) were used for crystallization. 
+Ethylated IZUMO1 (deglycosylated) and ethylated JUNO (undeglycosylated) were used for crystallization. 
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Proteome-wide covalent ligand discovery in native 


biological systems 


Keriann M. Backus", Bruno E. Correia!*, Kenneth M. Lum!, Stefano Forli*, Benjamin D. Horning!, Gonzalo E. Gonzdlez-Paez*, 
Sandip Chatterjee*, Bryan R. Lanning, John R. Teijaro*, Arthur J. Olson*, Dennis W. Wolan? & Benjamin F. Cravatt! 


Small molecules are powerful tools for investigating protein function 
and can serve as leads for new therapeutics. Most human proteins, 
however, lack small-molecule ligands, and entire protein classes 
are considered ‘undruggable’!”. Fragment-based ligand discovery 
can identify small-molecule probes for proteins that have proven 
difficult to target using high-throughput screening of complex 
compound libraries’. Although reversibly binding ligands are 
commonly pursued, covalent fragments provide an alternative route 
to small-molecule probes*””, including those that can access regions 
of proteins that are difficult to target through binding affinity 
alone*!®!!, Here we report a quantitative analysis of cysteine- 
reactive small-molecule fragments screened against thousands 
of proteins in human proteomes and cells. Covalent ligands were 
identified for >700 cysteines found in both druggable proteins 
and proteins deficient in chemical probes, including transcription 
factors, adaptor/scaffolding proteins, and uncharacterized proteins. 
Among the atypical ligand-protein interactions discovered were 
compounds that react preferentially with pro- (inactive) caspases. 
We used these ligands to distinguish extrinsic apoptosis pathways 
in human cell lines versus primary human T cells, showing that the 
former is largely mediated by caspase-8 while the latter depends on 
both caspase-8 and -10. Fragment-based covalent ligand discovery 
provides a greatly expanded portrait of the ligandable proteome 
and furnishes compounds that can illuminate protein functions in 
native biological systems. 

A major constraint of fragment-based ligand discovery (FBLD) 
methods is their reliance on assaying purified proteins. This aspect 
has restricted FBLD to proteins that can be produced in large quan- 
tities, and it accordingly remains unclear how many human proteins 
can be targeted by small molecules or whether these interactions can 
be optimized to furnish chemical probes for studying protein function 
in complex biological systems. We aimed to address these questions on 
a global scale by performing a quantitative analysis of the interactions 
between fragment electrophiles and thousands of cysteine residues in 
human proteomes and cells. 

We adapted a chemical proteomic method for quantifying cysteine 
reactivity—termed isotopic tandem orthogonal proteolysis-activity- 
based protein profiling (isoTOP-ABPP)'*'?—to perform cova- 
lent FBLD in native biological systems. Lysate or intact cells are 
pre-treated with dimethylsulfoxide (DMSO) or an electrophilic 
small-molecule fragment and then ak to a broad-spectrum 
cysteine-reactive probe, iodoacetamide (IA)-alkyne 1 (Fig. 1a). Proteins 
harbouring IA-alkyne-labelled cysteine residues from DMSO- and 
fragment-treated samples are then conjugated by copper-mediated 
azide-alkyne cycloaddition chemistry" to isotopically differentiated 
azide-biotin tags (heavy and light, respectively), combined, enriched 
by streptavidin, and proteolytically digested on-bead to yield iso- 
topic peptide pairs that are analysed by liquid chromatography-mass 


spectrometry (LC-MS). Quantification of MS1 chromatographic peak 
ratios for peptide pairs identifies fragment-competed Cys residues 
as those displaying high competition ratios, or R values, in DMSO/ 
fragment comparisons. 

We constructed a fragment library predominantly containing chlo- 
roacetamide or acrylamide electrophiles (Fig. 1b and Extended Data 
Fig. 1), which are well-characterized cysteine-reactive groups'®!*"!8. 
These electrophiles were appended to structurally diverse small- 
molecule recognition (or binding) elements to create library members 
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Figure 1 | Proteome-wide screening of covalent fragments. 

a, General protocol for competitive isoTOP-ABPP. Competition ratios, 

or R values, are measured by dividing the MS1 ion peaks for [A-alkyne 
(1)-labelled peptides in DMSO-treated (heavy, blue) versus fragment- 
treated (light, red) samples. LC/LC-MS/MS, multidimensional liquid 
chromatography-tandem mass spectrometry. b, General structure of 
electrophilic fragment library, in which the reactive (electrophilic) and 
binding groups are coloured green and black, respectively. c, Competitive 
isoTOP-ABPP analysis of the MDA-MB-231 cell proteome pre-treated 
with the electrophilic 3,5-di(trifluoromethyl)aniline chloroacetamide 3 
and acrylamide 14 fragments, along with the non-electrophilic acetamide 
analogue 17 (500\1M each). Proteomic reactivity values, or liganded 
cysteine rates, for fragments were calculated as the percentage of total 
cysteines with R values > 4 in DMSO/fragment (heavy/light) comparisons. 
d, Representative MS1 peptide ion chromatograms from competitive 
isoTOP-ABPP experiments marking liganded cysteines selectively targeted 
by one of three fragments, 3, 14 and 23. 
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with an average molecular weight of 284 Da. Since our goal was to 
probe the ligandability of cysteines in the human proteome, we 
screened the electrophile library at a high concentration (500 1M) 
similar to the compound concentrations used in FBLD experiments’. 
A subset of the fragment library was initially assayed by competitive 
profiling in a human MDA-MB-231 breast cancer cell line proteome 
using IA-rhodamine probe 16, which permitted SDS—polyacrylamide 
gel electrophoresis (SDS-PAGE) detection of cysteine reactivity events. 
This experiment identified several proteins that showed reductions 
in IA-rhodamine labelling in the presence of one or more fragments 
(Extended Data Fig. 2a). 

We then used competitive isoTOP-ABPP to globally map human 
proteins and the cysteine residues within these proteins that are tar- 
geted by fragment electrophiles. Each fragment was tested against 
two human cancer cell proteomes (MDA-MB-231 and Ramos cells), 
and most fragments were screened in duplicate against at least one of 
these proteomes. On average, 927 cysteines were quantified per data 
set, and we required that individual cysteines were quantified in at 
least three data sets for interpretation. On the basis of these criteria, 
~6,150 cysteines from ~2,900 proteins were quantified in aggregate 
across all data sets with an average quantification frequency of 22 data 
sets per cysteine (Extended Data Fig. 2b). Fragment-competed cysteine 
residues, or ‘liganded’ cysteines, were defined as those showing >75% 
reductions in [A-alkyne labelling (R values > 4 ). To minimize the 
potential for false positives, only cysteines that showed R values > 4 
in two or more data sets and met additional criteria for data quality 
control (see Supplementary Information) were considered as targets of 
the fragment electrophiles. The proteomic reactivity values, or liganded 
cysteine rates, of individual fragments were then calculated as the 
percentage of liganded per total quantified cysteines in isoTOP-ABPP 
experiments performed on that fragment. 

Most fragment electrophiles showed a tempered reactivity across the 
human proteome, with a median liganded cysteine rate of 3.8% for the 
library (Extended Data Fig. 2c). Substantial differences in reactivity 
were observed, however, with individual electrophiles showing liganded 
cysteine rates of <0.1% and others displaying rates >15% (Extended 
Data Fig. 2c). A subset of fragments was also screened at lower concen- 
trations (25-50 .M), which confirmed that their proteomic reactivities 
were concentration-dependent (Extended Data Fig. 2d). The relative 
reactivity of fragment electrophiles was similar in MDA-MB-231 and 
Ramos cell proteomes (Extended Data Fig. 2e), indicating that this 
parameter is an intrinsic property of the compounds. Fragments also 
showed consistent reactivity profiles when assayed in biological repli- 
cate experiments (Extended Data Fig. 2f). We found that the proteomic 
reactivity of fragment electrophiles was only marginally correlated with 
their glutathione adduction potential, which is a commonly used sur- 
rogate assay for measurements of proteinaceous cysteine reactivity” 
(Extended Data Fig. 2g). We attribute these differences to the impact of 


a b c 
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(758) (637) proteins (92) 
86% else 
Unliganded Cys Unliganded proteins Non-DrugBank 
(5,399) (2,248) proteins (545) 


Figure 2 | Analysis of cysteines and proteins liganded by fragment 
electrophiles. a, Fraction of total quantified cysteines and proteins that 
were liganded by fragment electrophiles in competitive isoTOP-ABPP 
experiments. b, Fraction of liganded proteins found in DrugBank. 

c, Functional classes of DrugBank and non-DrugBank proteins containing 
liganded cysteines. d, Comparison of the ligandability of cysteines as a 
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the recognition element of fragment electrophiles on their interactions 
and reactivity with proteins. 

A comparison of fragments 3, 14, 17, and 23-26 provided insights 
into the relative proteomic reactivity of different electrophilic groups 
coupled to a common recognition element (3,5-bis(trifluoromethyl) 
phenyl group). Chloroacetamide 3 exhibited greater reactivity than 
acrylamide 14 (Fig. 1c), with cyanoacrylamide 23, but not more ster- 
ically congested electrophiles (24-26) exhibiting similar reactivity to 
14 (Extended Data Fig. 2h). Importantly, the non-electrophilic aceta- 
mide control fragment 17 showed negligible activity in competitive 
isoTOP-ABPP experiments (Fig. 1c), indicating that the vast majority 
of detected fragment-cysteine interactions reflected covalent reactions 
versus non-covalent binding events. Also in support of this conclu- 
sion, ‘clickable’ alkyne analogues of 3 and 14 (compounds 19 and 18, 
respectively) exhibited different concentration-dependent proteome 
labelling profiles (19 > 18; Extended Data Fig. 2i) that mirrored the 
respective liganded cysteine rates of 3 and 14 determined by iso- 
TOP-ABPP (3 > 14; Fig. 1c). Despite the greater overall proteomic reac- 
tivity of 3 relative to 14 and 23, we found clear examples of cysteines 
that were preferentially liganded by the latter fragments (Fig. 1d and 
Supplementary Table 1). 

Across all isoTOP-ABPP data sets combined, 758 liganded cysteines 
were identified on 637 distinct proteins, which corresponded to ~12% 
and 22% of the total quantified cysteines and proteins, respectively 
(Fig. 2a and Supplementary Table 1). Only a modest fraction of the 
proteins harbouring liganded cysteines were found in the DrugBank 
database (14%; Fig. 2b), indicating that the fragment electrophiles tar- 
geted many proteins that lack small-molecule probes. Among protein 
targets with known covalent ligands, the fragment electrophiles fre- 
quently targeted the same cysteine residues as these known ligands 
(Extended Data Table 1a). For one of these targets—the protein kinase 
BTK—we confirmed that interaction with the drug ibrutinib could 
be detected by isoTOP-ABPP, which also identified a known ibruti- 
nib off target—MAP2K7 (ref. 20)—in Ramos cell lysates (Extended 
Data Fig. 3a). 

DrugBank proteins with liganded cysteines mostly originated from 
classes that are regarded as druggable, including enzymes, channels and 
transporters (Fig. 2c). Non-DrugBank proteins with liganded cysteines, 
on the other hand, showed a broader class distribution that included 
proteins, such as transcription factors and adaptor/scaffolding proteins, 
which are considered challenging to target with small-molecule lig- 
ands (Fig. 2c). We previously found that active-site and redox-active 
cysteines show, in general, greater intrinsic reactivity (as measured 
with the [A-alkyne probe) compared with other cysteines!?. While this 
heightened reactivity appears to be a contributory factor to the ligand- 
ability of cysteines, as reflected in the high proportion of hyperreactive 
(and active-site and redox-active) cysteines discovered as targets of 
fragment electrophiles, liganded cysteines were also well represented 
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function of their intrinsic reactivity with the IA-alkyne probe. Cysteine 
reactivity values (left y-axis) were taken from ref. 12, where lower ratios 
correspond to higher cysteine reactivity. A moving average with a step-size 
of 50 is shown in blue for the percentage of liganded cysteines within each 
reactivity bin (percentage values shown on right y-axis). 
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across a broad range of intrinsic reactivities and included many non- 
active residues (Fig. 2d, Extended Data Fig. 3b, c and Supplementary 
Discussion). Finally, most proteins were found to harbour a single 
liganded cysteine among the several cysteines that were, on average, 
quantified per protein by isoTOP-ABPP (Extended Data Fig. 3d, e and 
Supplementary Discussion). 

Liganded cysteines, including those found in active and non- 
active sites of proteins, displayed strikingly distinct structure—activity 
relationships (SARs) with the fragment electrophile library (Fig. 3a, 
Extended Data Fig. 3f-1, Supplementary Table 1 and Supplementary 
Discussion). We also found that, for the majority of liganded cysteines 
(>60%), electrophile ([A-alkyne or fragment) reactivity was blocked 
by heat denaturation of the proteome, while only a modest fraction of 
unliganded cysteines (~20%) showed decreased IA-alkyne labelling 
after heat denaturation (Extended Data Fig. 3m, n and Supplementary 
Discussion). These results indicate that the ligand—cysteine interac- 
tions are, in general, specific, in that they depend on both the binding 
groups of ligands and structured sites in proteins (see Supplementary 
Discussion). 

We next asked whether docking could predict sites of fragment 
electrophile reactivity. Covalent docking programs have recently been 
introduced to discover ligands that target pre-specified cysteines in pro- 
teins”!; here, however, we aimed to assess computationally the relative 
ligandability of all cysteines within a protein and match these outputs 
to the data acquired by isoTOP-ABPP (see Supplementary Discussion). 
The ranking of our computational predictions matched the experi- 
mental data for the majority of proteins investigated (that is, cases 
in which the top predicted ligandable cysteine matched the liganded 
cysteine determined by isoTOP-ABPP) (Fig. 3b, c and Extended Data 
Table 2). We also found that cysteines predicted to be ligandable were 
much more likely to have been detected by isoTOP-ABPP and exhibit 
heat-sensitive I[A-alkyne reactivity (Extended Data Fig. 30, p and 
Extended Data Table 2). These results indicate that reactive docking 
can provide a good overall prediction of the ligandability of cysteines. 

To determine the functional impact of ligand-cysteine interactions 
mapped by isoTOP-ABPP, we initially selected two enzymes—the 
protein methyltransferase PRMT1 and the MAP3 kinase MLTK (also 
known as ZAK)—that possessed liganded cysteines with previously 
demonstrated activities'*!°, Our findings confirmed that the fragment 
electrophiles targeting PRMT1 and MLTK inhibited these enzymes 
(Extended Data Fig. 4a-d (PRMT1), Extended Data Fig. 4e-i (MLTK) 
and Supplementary Discussion). We next evaluated proteins that 
possess previously uncharacterized liganded cysteines, including the 
nucleotide biosynthetic enzyme IMPDH2 and p53-induced phos- 
phatase TIGAR (p53 also known as TP53). In both cases, we found 
that ligand-cysteine interactions affected specific functions of these 
proteins: regulatory nucleotide binding and catalytic activity, respec- 
tively (Extended Data Fig. 5a~g (IMPDH2), Extended Data Fig. 5h-n 
(TIGAR) and Supplementary Discussion). 

Competitive isoTOP-ABPP experiments identified distinct subsets of 
ligands that targeted a conserved cysteine in isocitrate dehydrogenases 1 
(IDH1) and 2 (IDH2) (C269 and C308, respectively; Supplementary 
Table 1). IDH1 and IDH2 are mutated in a number of human cancers 
to produce enzyme variants with a neomorphic catalytic activity that 
converts isocitrate to the oncometabolite 2-hydroxyglutarate (2-HG)”. 
Reversible inhibitors selective for mutant forms of IDH1 and IDH2 
have been developed and are under clinical investigation for cancer 
treatment”. The liganded cysteine is an active-site-proximal residue 
that is 13 A from the NADP* molecule in a crystal structure of IDH1 
(Extended Data Fig. 6a). We confirmed that fragment ligands inhibited 
the activity of wild-type but not a C269S mutant of IDH1, and also 
blocked the R132H oncogenic mutant of IDH1 both in vitro and in cells 
(Extended Data Fig. 6b-k and Supplementary Discussion). 

Encouraged by the cellular activity displayed by IDH1 ligands, we 
sought to more generally assess the capacity of fragment electrophiles 
to modify cysteines in situ. Cells were treated with ~20 representative 
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Figure 3 | Analysis of fragment-cysteine interactions. a, Heat map 
showing R values for representative cysteines and fragments organized 
by proteomic reactivity values (high to low, left to right) and percentage 
of fragment hits for individual cysteines (high to low, top to bottom). 

R values > 4 designate fragment hits (coloured medium and dark blue). 
White indicates not detected (ND). b, Representative example of reactive 
docking predictions shown for XPO1 (Protein Data Bank accession 
3GB8). All accessible cysteines were identified and reactive docking was 
conducted with all fragments from the library within a 25 A docking cube 
centred on each accessible cysteine (cube shown in green for liganded 
Cys in XPO1; see Supplementary Information for more details). Legend 
presents categories of XPO1 cysteines based on combined docking and 
isoTOP-ABPP results. c, Success rate of reactive docking predictions 

for liganded cysteines identified by isoTOP-ABPP for 29 representative 
proteins. 


members of the fragment library (50-200 1M) for 2h in situ and then 
harvested, lysed, and analysed by isoTOP-ABPP. The tested fragments 
showed a broad range of in situ reactivities that generally matched 
their respective reactivities in vitro, although some exceptional cases 
with greater or lesser reactivity in situ were noted (Extended Data 
Fig. 6l and Supplementary Table 1). These differences could reflect the 
impact of transport and/or metabolic pathways on the cellular concen- 
trations of fragment electrophiles. A substantial fraction (64%) of the 
liganded cysteines identified in cell lysates were also sensitive to the 
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same electrophilic fragments in cells (Extended Data Fig. 6m). Four 
fragment-cysteine interactions were observed in situ, but not in lysates, 
including C182 of p53, a redox-regulated residue at the dimerization 
interface of the DNA-binding domain’ (Extended Data Fig. 6n). These 
liganded cysteines may require an intact cellular environment to pre- 
serve their interactions with fragment electrophiles. 

Several fragments targeted the catalytic cysteine nucleophile C360 
of the protease caspase-8 (CASP8) in isoTOP-ABPP experiments per- 
formed in vitro and in situ (Extended Data Fig. 7a and Extended Data 
Table 1). Curiously, however, these fragments exhibited marginal to 
no inhibition of active CASP8 using either substrate or activity-based 
probe (Rho-DEVD-AOMK probe) assays (Extended Data Fig. 7b, c). 
This initially puzzling outcome was explained when we discovered 
that the electrophilic fragments selectively labelled the inactive zymo- 
gen (pro-), but not active form of CASP8 (Fig. 4a, b, Extended Data 
Fig. 7b-] and Supplementary Discussion). We synthesized a clicka- 
ble analogue of the most potent pro-CASP8 ligand 7 (61; Fig. 4a) and 
found that this probe (251M) strongly labelled pro-CASP8, but not a 
pro-CASP8 C360S mutant (Fig. 4b and Extended Data Fig. 7i), and 
directly modified C360 of CASP8 in Jurkat cell lysates (Extended Data 
Table 1b). Compound 7 (501M) blocked labelling of pro-CASP8 by 
61, but did not inhibit labelling of active CASP8 or other caspases by 
the Rho-DEVD-AOMK probe” (Fig. 4b and Extended Data Fig. 7k, 1). 
Conversely, the general caspase inhibitor Ac-DEVD-CHO (20M) 
blocked Rho-DEVD-AOMK labelling of active CASP8 and other 
caspases, but not 61 labelling of pro-CASP8 (Fig. 4b and Extended 
Data Fig. 7k, 1). Similar results were obtained in substrate assays, 
in which DEVD-CHO, but not 7, blocked CASP8 and CASP3 activity 
(Extended Data Fig. 7b). 

We next confirmed that 7, but not a structurally related inactive probe 
(62; Extended Data Fig. 7f, g, k, m and Supplementary Discussion) 
blocked Fas ligand (FasL)-, but not staurosporine (STS)-induced apop- 
tosis in Jurkat cells (Extended Data Fig. 7n—p). Chemical proteomic 
experiments revealed that 7 fully inhibited CASP8, as well as the related 
initiator caspase CASP10 (but not other caspases, including CASP2, 3, 6 
and 9) in Jurkat cells (Fig. 4c, Extended Data Fig. 8a and Supplementary 
Table 1). We confirmed that 7 blocked labelling of pro-CASP10 by 61 
with an apparent half-maximum inhibitory concentration (ICs) value 
of 4.51.M (Extended Data Fig. 8b—d), but did not inhibit active CASP10 
as measured by labelling with the Rho-DEVD-AOMK probe (Extended 
Data Fig. 71) or a substrate assay (Extended Data Fig. 8e). 

The respective functions of CASP8 and CASP10 in extrinsic apop- 
tosis and other cellular processes remain poorly understood”>”*, in 
large part due to a lack of selective, non-peptidic, and cell-active inhib- 
itors for these enzymes and the absence of animal models for CASP10 
(which is not expressed in rodents). We therefore sought to address this 
challenge by improving the potency and selectivity of 7. Conversion 
of the 4-piperidino moiety to a 3-piperidino group and addition of a 
p-morpholino substituent to the benzoyl ring of 7 furnished compound 
63 that was separated by chiral chromatography into its two purified 
enantiomers, 63-R (Fig. 4c) and 63-S (see Supplementary Information), 
with 63-R showing substantially improved activity against CASP8 
compared to compound 7 (63-R apparent ICs9 value of 0.7 tM (95% 
confidence interval (CI), 0.5-0.8; Extended Data Fig. 8f-h) and neg- 
ligible cross-reactivity with CASP10 (IC59 value >100 1M; Extended 
Data Fig. 8c, d, f). 63-S was much less active against CASP8 (apparent 
ICso value of 15 1M; Extended Data Fig. 8g, h) and also inactive against 
CASP 10 (Extended Data Fig. 8d). With dual CASP8-CASP 10 (7) and 
CASP8-selective (63-R) ligands in hand, we next set out to investigate 
the biological functions of these proteases. 

We evaluated the effects of our caspase ligands in human T cells, 
in which both CASP8 and CASP10 are highly expressed (Extended 
Data Fig. 8i) and their respective roles much debated?>°, as well as 
in Jurkat cells, which are a commonly studied immortalized human 
T-cell line. We found that 63-R fully blocked FasL-induced apoptosis 
in Jurkat cells and did so with greater potency than 7 (Fig. 4d and 
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Figure 4 | Electrophile compounds that target pro-CASP8 and 
pro-CASP10. a, Compound 7 blocked IA-rhodamine 16 labelling of 
recombinant, purified pro-CASP8 (bearing a C409S mutation to eliminate 
16 labelling at this site; added to Ramos cell lysate at 11M). Note that a 
C360S/C409S-mutant of pro-CASP8 did not label with 16. b, 7 blocked 
probe labelling of pro-, but not active, CASP8. Recombinant pro- and 
active CASP8 (11M) were treated with 7 (501M) or Ac-DEVD-CHO 
(201M), for 1h followed by click probe 61 (251M) for pro-CASP8 and 
the Rho-DEVD-AOMK probe (21M) for active CASP8. c, Heat map 
showing R values for caspases measured by quantitative proteomics in 
Jurkat cells treated with 7, 63-R or 62 followed by probe 61 (101M, 1h). 
d, Comparison of effects of 7 and 63-R on FasL-induced apoptosis in 
Jurkat cells or anti-CD3, anti-CD28-activated primary human T cells. 
Data represent mean values + standard error of the mean (s.e.m.) for 

at least three independent experiments, and results are representative 
of multiple experiments performed with T cells from different human 
subjects. Statistical significance was calculated with unpaired Student's 
t-tests comparing DMSO- to compound-treated samples, **P < 0.01, 
* P < ().0001; and comparing compound 7 to 63-R, ++P< 0.01, 

THTtP < 0.001, t+++P < 0.0001. 


Extended Data Fig. 8j) or 63-S (Extended Data Fig. 8k). Similar results 
were obtained in HeLa cells, which express CASP8, but not CASP10 
(Extended Data Fig. 81)*°. In contrast to these cell line results, FasL- 
induced apoptosis in primary human T cells showed substantial resist- 
ance to 63-R at all tested concentrations and instead was completely 
inhibited by the dual CASP8/10 ligand 7 (Fig. 4d). We confirmed by 
chemical proteomics with probe 61 that 7 blocked both CASP8 and 
CASP10, while 63-R inhibited CASP8, but not CASP 10, in primary 
human T cells (Supplementary Table 1) and Jurkat cells (Fig. 4c and 
Supplementary Table 1). Consistent with these cell death results, 7, 
but not 63-R, prevented proteolytic processing of CASP3 and CASP10 
in primary human T cells (Extended Data Fig. 8m). Interestingly, the 
processing of both CASP8 and the initiator caspase substrate RIP kinase 
were also preferentially inhibited by 7 versus 63-R (Extended Data 
Fig. 8m), indicating that CASP10 may contribute to these proteolytic 
events in T cells, as has been suggested by biochemical studies”’. These 
data, taken together, support substantive functions for both CASP8 and 
CASP 10 in primary human T cells and are consistent with genetic find- 
ings indicating that deleterious mutations in either CASP8 or CASP10 
can lead to autoimmune syndromes in humans”*. 

By combining chemical proteomics with FBLD, we have found 
that the human proteome contains many ligandable cysteines. These 
cysteines were found in proteins not previously known to interact 
with small molecules, revealing that covalent chemistry can be used to 
expand the druggable content of the human proteome. Our results for 
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pro-CASP8 and the results of others for K-Ras(G12C)'*" indicate that 
it is possible to improve the potency and selectivity of covalent fragment 
hits for protein targets, although the optimization of covalent ligands 
for cysteines that reside in very shallow pockets may prove more chal- 
lenging. Some covalent ligands may target cysteines at non-functional 
sites on proteins, and, in these cases, there is potential to convert the 
ligands into functional probes using emergent platforms for directing 
liganded proteins to degradation pathways in the cells?*3°. We envision 
that extensions of our chemical proteomic platform could be used to 
discover ligands that target other nucleophilic amino acids in proteins, 
thereby increasing the impact covalent chemistry will have on pro- 
teome-wide ligand and drug discovery. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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Preparation of human cancer cell line proteomes. With the exception of MUM2C 
cells, which were provided by M. Hendrix, all cell lines were obtained from ATCC, 
tested negative for mycoplasma contamination, and were used without further 
authentication, maintaining a low passage number (<10 passages). Cell lines were 
grown at 37°C with 5% CO2. MDA-MB-231(ATCC: HTB-26), HeLa (ATCC: CCL-2) 
and HEK-293T (ATCC: CRL-3216) cells were grown in DMEM supplemented with 
10% fetal bovine serum (FBS), penicillin, streptomycin and glutamine. Jurkat A3 
(ATCC: CRL-2570), Ramos (ATCC: CRL-1596) and MUM2C cells were grown in 
RPMI-1640 medium supplemented with 10% FBS, penicillin and streptomycin. For 
in vitro labelling, cells were grown to 100% confluence for MDA-MB-231 cells or 
until cell density reached 1.5 million cells per ml for Ramos and Jurkat cells. Cells 
were washed with cold PBS, scraped with cold PBS and cell pellets were isolated 
by centrifugation (1,400g, 3 min, 4°C), and stored at —80°C until use. Cell pellets 
were lysed by sonication and fractionated (100,000g, 45 min) to yield soluble and 
membrane fractions, which were then adjusted to a final protein concentration of 
1.5mgml ' for proteomics experiments and 1 mg ml ! for gel-based ABPP exper- 
iments. The soluble lysate was prepared fresh from frozen pellets directly before 
each experiment. Protein concentration was determined using the Bio-Rad DC 
protein assay kit. 

Proteomic sample preparation. IsoTOP-ABPP, stable isotope labelling by amino 
acids in cell culture (SILAC) and reductive dimethylation for stable isotope 
labelling (REDIME) samples were prepared and analysed as has been reported 
previously!”3!~*3. For details see Supplementary Information. 

In vitro covalent fragment treatment. All compounds were made up as 50 mM 
stock solutions in DMSO and were used at a final concentration of 500 1M. Owing 
to its low solubility in aqueous medium, fragment 4 was screened at a final con- 
centration of 250|1M. Soluble lysates were adjusted to 1.5mg ml! and, for each 
profiling sample, 0.5 ml of lysate was treated with 5 11 of the 50 mM compound 
stock solution or 5 11 of DMSO. 

In situ covalent fragment treatment. For in situ labelling, MDA-MB-231 cells 
were grown to 95% confluence and Ramos cells were grown to 1 million cells per 
ml. The media in all samples was replaced with fresh media, containing 200 1M 
of the indicated fragments and the cells were incubated at 37°C for 2h, washed 
with cold PBS, scraped into cold PBS and harvested by centrifugation (see earlier). 
Fragments 2, 3, 8, 9, 10, 12, 13, 14, 21, 27, 28, 29, 31, 33, 38, 45, 51 and 56 were 
screened at 200M in situ. Fragments 4 and 11 were screened at 100,\M in situ. 
Fragments 2, 3, 8 and 20 were tested at 501M in situ. 

Heat inactivation. For heat inactivation experiments, 500 il of MDA-MB-231 
soluble proteome was denatured (95°C, 10 min) and allowed to cool to ambient 
temperature. The denatured sample and corresponding non-denatured, native 
proteome (500 11) were then each labelled with 1001.M iodoacetamide alkyne 
(IA-alkyne, 5,11 of 10 mM stock in DMSO) and evaluated by isoTOP-ABPP. 

R value calculation and processing. The ratios of heavy/light for each unique pep- 
tide (DMSO/compound treated; isoTOP-ABPP ratios, R values) were quantified 
with in-house CIMAGE software’, using default parameters (3 MS1 acquisitions 
per peak and signal to noise threshold set to 2.5). Site-specific engagement of elec- 
trophilic fragments was assessed by blockade of IA-alkyne probe labelling. For pep- 
tides that showed a >95% reduction in MS1 peak area from the fragment-treated 
proteome (light TEV tag) when compared to the DMSO-treated proteome (heavy 
TEV tag), a maximal ratio of 20 was assigned. Ratios for unique peptide entries 
are calculated for each experiment; overlapping peptides with the same modified 
cysteine (for example, different charge states, MudPIT chromatographic steps or 
tryptic termini) are grouped together and the median ratio is reported as the final 
ratio (R). The peptide ratios reported by CIMAGE were further filtered to ensure 
the removal or correction of low-quality ratios in each individual data set. The 
quality filters applied were the following: removal of half tryptic peptides; for ratios 
with high standard deviations from the median (90% of the median or above) the 
lowest ratio was taken instead of the median; removal of peptides with R= 20 and 
only a single MS2 event triggered during the elution of the parent ion; manual 
annotation of all the peptides with ratios of 20, removing any peptides with low- 
quality elution profiles that remained after the previous curation steps. Proteome 
reactivity values for individual fragments were computed as the percentage of the 
total quantified cysteine-containing peptides with R values > 4 (defined as liganded 
cysteines) for each replicate experiment and the final proteome reactivity value was 
calculated as the mean for all replicate experiments for each fragment from both 
MDA-MB-231 and Ramos cellular proteomes. See Supplementary Information 
for additional details. 

Functional annotation of liganded cysteines. Custom Python scripts were used 
to compile functional annotations available in the UniProtKB/Swiss-Prot Protein 
Knowledge* database (release-2012_11). Relevant UniProt entries were mined for 
available functional annotations at the residue level, specifically for annotations 
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regarding enzyme catalytic residues (active sites), disulfides (redox active and 
structural) and metal-binding sites. Liganded proteins were queried against the 
Drugbank database (v.4.2)*° and fractionated into DrugBank and non-DrugBank 
proteins. Functional keywords assigned at the protein level were collected from the 
UniProt database and the DrugBank and non-DrugBank categories were further 
classified into protein functional classes. Previously collected cysteine reactivity 
data!? were re-processed using ProLuCID as detailed in the Supplementary 
Information. Cysteines found in both the reactivity and ligandability data sets 
were sorted on the basis of their reactivity values (lower ratio indicates higher 
reactivity). The moving average of the percentage of total liganded cysteines 
within each reactivity bin (step-size 50) was taken. Custom Python scripts were 
developed to collect relevant NMR and X-ray structures from the RCSB Protein 
Data Bank (PDB)”*. For proteins without available PDB structures, sequence 
alignments, performed with BLAST” to proteins deposited in the PDB, were used 
to identify structural homologues. For annotation of active-site and non-active 
cysteines, enzymes with structures in the PDB were manually inspected to eval- 
uate the location of the cysteine. Cysteines were considered to reside in enzyme 
active sites if they were within 10 A of an active-site ligand or residue(s). Cysteines 
outside of the 10 A range were deemed non-active-site residues. Histograms of 
fragment hit-rates across high-coverage, ligandable cysteines, active-site and 
non-active site cysteines were calculated from the subset of ligandable cysteines 
quantified in ten or more separate experiments. For analyses of trends within 
the whole data, including histograms and heat maps, a cell-line merged data 
set was used in which data from the MDA-MB-231 experiments were taken 
first and the Ramos data were used if there were no data from MDA-MB-231 
experiments for a particular fragment and cysteine. Heat maps were generated 
in R (v.3.1.3) using the heatmap.2 algorithm. Protein structures were rendered 
using PyMol°®, 

Reactive cysteine docking. In silico fragment library containing all chloroaceta- 
mide and acrylamide fragments from Extended Data Fig. 1 was prepared using 
Open Babel library*’ with custom Python scripts. Fragments were modelled in 
their reactive form (that is, with explicit chloroacetamide and acrylamide war- 
heads). Three-dimensional coordinates were generated from SMILES strings, 
calculating their protonation state at pH 7.4, and then minimizing them using 
MMF94s forcefield (50K iterations steepest descent; 90K conjugate gradient); for 
chiral molecules with undefined configuration, all enantiomers were generated, 
resulting in 53 total fragments. For each protein, the UniProtKB accession num- 
ber was used to filter the PDB**. Structures determined by X-ray crystallography 
were selected, privileging higher sequence coverage and structure resolution (see 
Extended Data Table 2 for selected PDB accessions). When no human structures 
were available, the closest homologous organism available was selected (for exam- 
ple, PRMT1: R. norvegicus). Protein structures were prepared following the stand- 
ard AutoDock protocol. Waters, salts and crystallographic additives were removed; 
AutoDockTools”’ was used to add hydrogens, calculate Gasteiger—Marsili charges 
and generate PDBQT files. MSMS reduced surface method"! was used to iden- 
tify accessible cysteines. The protein volume was scanned using a probe radius of 
1.5 A; residues were considered accessible if they had at least one atom in contact 
with either external surfaces or internal cavities. The fragment library was docked 
independently on each accessible cysteine using AutoDock 4.2 (ref. 40). A grid box 
of 24.4 x 24.4 x 24.4 A was centred on the geometric centre of the residue; thiol 
hydrogen was removed from the side chain, which was modelled as flexible during 
the docking; the rest of the structure was kept rigid. A custom 13-7 interaction 
potential was defined between the nucleophile sulfur and the reactive carbon in 
the ligands. The equilibrium distance (rq) was set to the length of the C-S covalent 
bond (1.8 A); the potential well depth (eq) varied between 1.0 and 0.175 to model 
to the reactivity of the different ligands. For each fragment, potential well depth 
was determined by dividing its proteomic reactivity percentage by 20, and the 
value for iodoacetamide was approximated as the maximum (2.5) for reference. 
The potential was implemented by modifying the force field table of AutoDock. 
Fragments were docked with no constraints, generating 100 poses using the default 
GA settings. For each fragment, the best docking score pose was analysed: if the 
distance between the nucleophilic sulfur and the reactive carbon was <2.0A, the 
cysteine was considered covalently modified. If a residue was alkylated by at least 
one ligand, it was considered labelled. The docking score (that is, negative binding 
energy) was calculated based on the estimated interaction energy of each fragment 
in its docked pose. The docking score of the best alkylating fragment defined the 
labelling score. The residue with the best labelling score was considered the most 
probable to be labelled. 

CASP3 and CASP$8 in vitro activity assays. CASP3 and 8 assays were conducted 
with CASP8 activity assay kit (BioVision, K112-100) and CASP3 activity assay kit 
(Invitrogen, EnzChek Caspase-3 Assay Kit), following the manufacturer's instruc- 
tions. For further details see Supplementary Information. 
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Primary human T-cell isolation and stimulation. All studies using samples from 
human volunteers follow protocols approved by The Scripps Research Institute 
institutional review board (protocol no. IRB-15-6682). Blood from healthy donors 
was obtained after informed consent and peripheral blood mononuclear cells 
(PBMCs) were purified over Ficoll-Hypaque gradients (Sigma-Aldrich). T cells 
were purified via negative selection with magnetic beads (EasySep, STEMCELL). 
The purified T cells were washed with sterile PBS and resuspended in RPMI-1640 
supplemented with FBS, penicillin, streptomycin and glutamine (2 million cells per ml) 
and 200,000 cells per well were seeded on non-treated tissue culture, 96-well trans- 
parent plates that had been coated with anti-CD3 (1:200, BioXcell) and anti-CD28 
(1:500, Biolegend) in PBS (10011 per well). The T cells were removed from 
stimulus after 3 days and cultured in complete RPMI-1640 supplemented with IL-2 
(10g ml“, eBioscience) for 3-4 additional days. 

Apoptosis assays in primary human T cells with CASP8 inhibitors. Primary 
human T cells were stimulated for 3 days with anti-CD3 and anti-CD28, and the 
cells were then washed and cultured in complete RPMI with IL-2 (10j.gml-!) for 
4 additional days. For western blot analysis, 10 ml of stimulated primary human 
T cells (1.5 million cells per ml) in RPMI with IL-2 were then treated with the indi- 
cated compounds for 1h before addition of FasL (111 of 100g“! stock solution 
of MegaFasLigand in water, final concentration = 10ng ml~', Adipogen). After 
3h, cells were harvested by centrifugation, washed in PBS and lysed in cell lysis 
buffer (BioVision, 1067-100) with 1 x complete protease inhibitor (Roche) and 
40 1g of each sample were separated by SDS-PAGE on 14% polyacrylamide gels. 
The gels were transferred to nitrocellulose membranes and were immunoblot- 
ted overnight with the indicated antibodies. For measurements of cell viability, in 
triplicate for each condition, 150,000 cells (10011 of 1.5 million cells per ml) were 
plated in 96-well optical-bottom plates. FasL was used at the same concentration 
indicated earlier with a 30 min pre-incubation with compounds at the indicated 
concentrations, followed by 4h with FasL or DMSO. Twenty times compound stock 
solutions were made in RPMI immediately before use. Cell viability was measured 
with CellTiter-Glo Luminescent Cell Viability Assay (Promega) and was read ona 
Biotech Synergy 4 plate reader. 

Western blotting. For apoptosis studies, cell pellets were resuspended in cell lysis 
buffer from (BioVision, 1067-100) with 1x complete protease inhibitor (Roche) 
and allowed to incubate on ice for 30 min before centrifugation (10 min, 16,000g). 
For all other studies, cell pellets were resuspended in PBS and lysed with sonica- 
tion before centrifugation (10 min, 16,000g). The proteins were then resolved by 
SDS-PAGE and transferred to nitrocellulose membranes, blocked with 5% milk 
in TBST and probed with the indicated antibodies. The primary antibodies and 
the dilutions used are as follows: anti-PARP (Cell Signaling, 9532, 1:1,000), anti- 
CASP3 (Cell Signaling, 9662, 1:1,000), anti-CASP8 (cleaved form, Cell Signaling, 
9746, 1:1,000), anti-CASP8 (pro-form, Cell Signaling, 4970, 1:1,000), anti-IDH1 


(Cell Signaling, 3997 s, 1:500), anti-actin (Cell Signaling, 3700, 1:3,000), anti- 
GAPDH (Santa Cruz, sc-32233, 1:2,000), anti-Flag (Sigma Aldrich, F1804, 1:3,000), 
anti-CASP10 (MBL, M059-3, 1:1,000), anti-RIPK (Cell Signaling, 3493S, 1:1,000). 
Blots were incubated with primary antibodies overnight at 4°C with rocking and 
were then washed (3 x 5 min, TBST) and incubated with secondary antibodies 
(LICOR, IRDye 800CW or IRDye 800LT, 1:10,000) for 1h at ambient temper- 
ature. Blots were further washed (3 x 5 min, TBST) and visualized on a LICOR 
Odyssey Scanner. The percentage cleavage was determined by quantifying the 
integrated optical intensity of the bands (n= 3 for STS and n=2 for FasL), using 
Image] software”. For CASP8, the 43 and 41 kDa bands were quantified together. 
For CASP3, the 17 kDa subunit band was quantified. 

Statistical analysis. The experiments were not randomized. The investigators 
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Extended Data Figure 1 | Composition of fragment electrophile library and structures of additional tool compounds, click probes and fragments . 
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Extended Data Figure 2 | See next page for caption. 
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Extended Data Figure 2 | Analysis of proteomic reactivities of fragment 
electrophiles in human cell lysates. a, Initial analysis of the proteomic 
reactivity of fragments using an I[A-rhodamine probe 16. Soluble proteome 
from Ramos cells was treated with the indicated fragments (500 1M 

each) for 1h, followed by labelling with [A-rhodamine (11M, 1h) and 
analysis by SDS-PAGE and in-gel fluorescence scanning. Several proteins 
were identified that show impaired reactivity with I[A-rhodamine in the 
presence of one or more fragments (asterisks). Fluorescent gel shown 

in greyscale. b, Frequency of quantification of all cysteines across the 
complete set of competitive isoTOP-ABPP experiments performed with 
fragment electrophiles. Note that cysteines were required to have been 
quantified in at least three isoTOP-ABPP data sets for interpretation. 

c, Rank order of proteomic reactivity values (or liganded cysteine rates) 

of fragments calculated as the percentage of all quantified cysteines with 

R values > 4 for each fragment. The majority of fragments were evaluated 
in 2-4 replicate experiments in MDA-MB-231 and/or Ramos cell lysates, 
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and their proteomic reactivity values are reported as mean + s.e.m. 
values for the replicates. d, Comparison of the proteomic reactivities of 
representative fragments screened at 500 versus 25 1M in cell lysates. 

e, Comparison of proteomic reactivity values for fragments tested in 
both Ramos and MDA-MB-231 lysates. f, Proteomic reactivity values of 
representative fragments. g, Relative GSH reactivity for representative 
fragment electrophiles. Consumption of GSH (125 1M) was measured 
using Ellman’s reagent (5 mM) after 1 h incubation with the indicated 
fragments (500 |1M). h, Proteomic reactivity values for fragments 
electrophiles (500 1M) possessing different electrophilic groups attached 
to a common binding element. i, Concentration-dependent labelling of 
MDA-MB-231 soluble proteomes with acrylamide 18 and chloroacetamide 
19 click probes detected by CuACC with a rhodamine-azide tag and 
analysis by SDS-PAGE and in-gel fluorescence scanning. For f and g, 
data represent mean values + s.e.m. for at least three independent 
experiments. 
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Extended Data Figure 3 | See next page for caption. 
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Extended Data Figure 3 | Analysis of cysteines liganded by 

fragment electrophiles in competitive isoTOP-ABPP experiments. 

a, Representative MS1 ion chromatograms for peptides containing C481 
of BTK and C131 of MAP2K7, two cysteines known to be targeted by the 
anti-cancer drug ibrutinib. Ramos cells were treated with ibrutinib (111M, 
1h, red trace) or DMSO (blue trace) and evaluated by isoTOP-ABPP. 

b, Total number of liganded cysteines found in the active sites and 
non-active sites of enzymes for which X-ray and/or NMR structures 

have been reported (or reported for a close homologue of the enzyme). 

c, Functional categorization of liganded and unliganded cysteines 

based on residue annotations from the UniProt database. d, Number of 
liganded and quantified cysteines per protein measured by isoTOP-ABPP. 
Respective average values of one and three for liganded and quantified 
cysteines per protein were measured by isoTOP-ABPP. e, R values for 

six cysteines in XPO1 quantified by isoTOP-ABPP, identifying C528 

as the most liganded cysteine in this protein. Each point represents a 
distinct fragment-cysteine interaction quantified by isoTOP-ABPP. 

f-h, Histograms depicting the percentage of fragments that are hits (R > 4) 
for all 768 liganded cysteines (f), for liganded cysteines found in enzymes 
for which X-ray and/or NMR structures have been reported (or reported 
for a close homologue of the enzyme) (g), or for active- and non-active site 
cysteines in kinases (h). i, Percentage of liganded cysteines targeted only 
by group A (red) or B (blue) fragments or both group A and B fragments 
(black). Shown for all liganded cysteines, liganded cysteines in enzyme 


LETTER 


active and non-active sites, and liganded cysteines in transcription 
factors/regulators. For g, i, active-site cysteines were defined as those 
that reside <10 A from established active-site residues and/or bound 
substrates/inhibitors in enzyme structures. j, The percentage of liganded 
cysteines in kinases that were targeted by only group A, only group B, 

or both group A and B compounds. k, Heat map showing representative 
fragment interactions for liganded cysteines found in the active sites and 
non-active sites of kinases. l, Heat map showing representative fragment 
interactions for liganded cysteines found in transcription factors/regulators. 
m, The fraction of liganded (62%; 341 of 553 quantified cysteines) 

and unliganded (20%; 561 of 2,870 quantified cysteines) cysteines that 
are sensitive to heat denaturation measured by IA-alkyne labelling 

(R > 3 native/heat denatured). n, Percentage of proteins identified 

by isoTOP-ABPP as liganded by fragments 3 and 14 and enriched by 
their corresponding click probes 19 and 18 that are sensitive to heat 
denaturation (64% (85 of 133 quantified protein targets) and 73% 

(19 of 26 quantified protein targets), respectively). Protein enrichment 
by 18 and 19 was measured by whole-protein capture of isotopically 
SILAC-labelled MDA-MB-231 cells using quantitative (SILAC) 
proteomics. 0, The fraction of cysteines predicted to be ligandable or 
unligandable by reactive docking that were quantified in isoTOP-ABPP 
experiments. p, The fraction of cysteines predicted to be ligandable or 
unligandable by reactive docking that show heat-sensitive labelling by the 
IA-alkyne probe (R > 3 native/heat denatured). 
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Extended Data Figure 4 | Confirmation and functional analysis of 
fragment-cysteine interactions in PRMT1 and MLTK. a, Representative 
MS1 chromatograms for the indicated Cys-containing peptides from 
PRMT1 quantified in competitive isoTOP-ABPP experiments of 
MDA-MB-231 cell lysates, showing blockade of IA-alkyne 1 labelling 
of C109 by fragment 11, but not control fragment 3. b, 11, but not 

3, blocked IA-rhodamine (2 1M) labelling of recombinant, purified 
wild-type PRMT1 (11M protein doped into HEK293T cell lysates). 
Note that a C109S PRMT1 mutant did not react with [A-rhodamine. 

c, Apparent ICs9 curve for blockade of 16 labelling of PRMT1 by 11. 

CI, 95% confidence intervals. d, Effect of 11 and control fragment 3 on 
methylation of recombinant histone 4 by recombinant PRMT1. Shown 
is one representative experiment of three independent experiments that 
yielded similar results. e, Representative MS1 ion chromatograms for 
the MLTK tryptic peptide containing liganded cysteine C22 quantified 
by isoTOP-ABPP in MDA-MB-23]1 lysates treated with fragment 4 or 
control fragment 3 (500 1M each). f, 60, but not control fragment 3 


150- 


% MLTK Activity 


MLTK C22A-MLTK 


(50\1M of each fragment), blocked labelling of recombinant MLTK kinase 
by a previously reported ibrutinib-derived activity probe 59 (top)”°. 

A C22A-MLTK mutant did not react with the ibrutinib probe. Anti-Flag 
blotting confirmed similar expression of wild-type and C22A-MLTK 
proteins, which were expressed as Flag-fusion proteins in HEK293T cells 
(bottom). g, Lysates from HEK293T cells expressing wild type or C22A 
MLTK treated with the indicated fragments and then an ibrutinib-derived 
activity probe 59 (ref. 20) at 101M. MLTK labelling by 59 was detected by 
CuAAC conjugation to a rhodamine-azide tag and analysis by SDS-PAGE 
and in-gel fluorescence scanning. h, Apparent ICs curve for blockade of 
ibrutinib probe-labelling of MLTK by 60. i, 60, but not control fragment 3 
(100 1M of each fragment), inhibited the kinase activity of wild-type, but 
not C22A-MLTK. For c, h and i, data represent mean values + s.e.m. for at 
least three independent experiments. Statistical significance was calculated 
with unpaired Student's t-tests comparing DMSO- to fragment-treated 
samples; **P< 0.1. 
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Extended Data Figure 5 | See next page for caption. 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


Extended Data Figure 5 | Confirmation and functional analysis 

of fragment-cysteine interactions in IMPDH2 and TIGAR. 

a, Representative MS1 ion chromatograms for IMPDH2 tryptic peptides 
containing the catalytic cysteine, C331, and Bateman domain cysteine, 
C140, quantified by isoTOP-ABPP in cell lysates treated with the indicated 
fragments (500 1M each). b, Structure of human IMPDH2 (PDB accession 
1NF7) (light grey) and its structurally unresolved Bateman domain 
modelled by I-TASSER* (dark grey) showing the positions of C331 

(red spheres), ribavirin monophosphate and C2-mycophenolic adenine 
dinucleotide (blue), and C140 (yellow spheres). ¢, Click probe 18 (25 1M) 
labelled wild-type IMPDH2 and C331S IMPDH2, but not C140S 
IMPDH2 (or C140S/C331S IMPDH2). Labelling was detected by CuAAC 
conjugation to a rhodamine-azide reporter tag and analysis by SDS-PAGE 
and in-gel fluorescence scanning. Recombinant IMPDH2 wild type and 
mutants were expressed and purified from Escherichia coli and added 

to Jurkat lysates to a final concentration of 1 1M protein. d, Fragment 
reactivity with recombinant, purified IMPDH2 added to Jurkat lysates 

to a final concentration of 11M protein, where reactivity was detected in 
competition assays using the click probe 18 (251M). Note that 18 reacted 
with wild-type and C331S IMPDH2, but not C140S or C140S/C331S 
IMPDH2. e, Nucleotide competition of 18 (25 1M) labelling of wild-type 
IMPDH2 added to MDA-MB-231 lysates to a final concentration of 1 1M 
protein. f, Nucleotide competition profile for 18 labelling of recombinant 


wild-type IMPDH2 (500M of each nucleotide). g, Apparent ICs curve 
for blockade of 18 labelling of IMPDH2 by ATP. h, Representative MS1 
chromatograms for TIGAR tryptic peptides containing C114 and C161 
quantified by isoTOP-ABPP in cell lysates treated with the indicated 
fragments (500 1M each). i, Crystal structure of TIGAR (PDB accession 
3DCY) showing C114 (red spheres), C161 (yellow spheres), and inorganic 
phosphate (blue). j, Labelling of recombinant, purified TIGAR and 
mutant proteins by the IA-rhodamine (2 1M) probe. TIGAR proteins 
were added to MDA-MB-231 lysates, to a final concentration of 211M 
protein. k, 5, but not control fragment 3, blocked [A-rhodamine (2 1M) 
labelling of recombinant, purified C161S TIGAR (21M protein 

doped into Ramos cell lysates). 1, Apparent ICs curve for blockade of 
IA-rhodamine labelling of C161S TIGAR by 5. m, 5, but not control 
fragment 3 (100M of each fragment) inhibited the catalytic activity of 
wild-type TIGAR, C161S TIGAR, but not C114S TIGAR or C114S/C161S 
TIGAR. n, Concentration-dependent inhibition of wild-type TIGAR by 5. 
Note that the C140S-TIGAR mutant was not inhibited by 5. Data represent 
mean values + s.e.m. for four replicate experiments at each concentration. 
For f, g and l-n, data represent mean values + s.e.m. for at least three 
independent experiments. Statistical significance was calculated with 
unpaired Student's t-tests comparing DMSO- to fragment-treated samples; 
** P< 0.01, ****P < 0.0001. 
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Extended Data Figure 6 | See next page for caption. 
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Extended Data Figure 6 | IDH1-related and general in situ activity 

of fragment electrophiles. a, X-ray crystal structure of IDH1 (PDB 
accession 3MAS) showing the position of C269 and the frequently mutated 
residue in cancer, R132. b, Blockade of 16 labelling of wild-type IDH1 by 
representative fragment electrophiles. Recombinant, purified wild-type 
IDH1 was added to MDA-MB-231 lysates at a final concentration of 21M, 
treated with fragments at the indicated concentrations, followed by 
IA-rhodamine probe 16 (21M) and analysis by SDS-PAGE and in-gel 
fluorescence scanning. Note that a C269S mutant of IDH1 did not label 
with IA-rhodamine 16. c, d, Reactivity of 20 and control fragment 2 with 
recombinant, purified wild-type IDH1 (b) or R132H IDHI (c) added 

to MDA-MB-231 lysates to a final concentration of 2 or 4M protein, 
respectively. Fragment reactivity was detected in competition assays using 
the [A-rhodamine probe (21M) e, f, Apparent ICs9 curve for blockade of 
IA-rhodamine labelling of IDH1 (e) and R132H IDH1 (f) by 20. Note 

that the control fragment 2 showed much lower activity. g, Representative 
MS1 ion chromatograms for the IDH1 tryptic peptides containing 
liganded cysteine C269 and an unliganded cysteine C379 quantified by 
isoTOP-ABPP in MDA-MB-23]1 lysates treated with fragment 20 (25 1M). 
h, 20, but not 2, inhibited IDH1-catalysed oxidation of isocitrate to 
a-ketoglutarate («-KG) as measured by an increase in NADPH production 
(340 nm absorbance). 20 did not inhibit the C269S-IDH1 mutant. 

i, 20 inhibited oncometabolite 2-hydroxyglutarate (2-HG) production 

by R132H IDH1. MUM2C cells stably overexpressing the oncogenic 
R132H-IDH1 mutant or control green fluorescent protein (GFP)- 
expressing MUM2C cells were treated with the indicated fragments 

(2h, in situ). Cells were harvested, lysed and IDH1-dependent production 


of 2-HG from a-KG and NADPH was measured by LC-MS and from 
which 2-HG production of GFP-expressing MUM2C cells was subtracted 
(GFP-expressing MUM2C cells produced <10% of the 2~-HG generated by 
R132H-IDH1-expressing MUM2C cells). j, Western blot of MUM2C 

cells stably overexpressing GFP (mock) or R132H-IDH1 proteins. 

k, Representative MS1 chromatograms for the IDH1 tryptic peptides 
containing liganded cysteine C269 and an unliganded cysteine C379 
quantified by isoTOP-ABPP in R132H-IDH-expressing MUM2C lysates 
treated with 20 or control fragment 2 (501M, 2h, in situ). 1, Proteomic 
reactivity values for individual fragments are comparable in vitro and 

in situ. One fragment (11) marked in red showed notably lower reactivity 
in situ versus in vitro. Reactivity values were calculated as in Fig. Ic. 
Dashed line mark 90% prediction intervals for the comparison of in vitro 
and in situ proteomic reactivity values for fragment electrophiles. Blue 
and red circles mark fragments that fall above (or just at) or below 

these prediction intervals, respectively. m, Fraction of cysteines liganded 
in vitro that are also liganded in situ. Shown are liganded cysteine numbers 
for individual fragments determined in vitro and the fraction of these 
cysteines that were liganded by the corresponding fragments in situ. 

n, Representative cysteines that were selectively targeted by fragments 

in situ, but not in vitro. For in situ-restricted fragment-cysteine 
interactions, a second cysteine in the parent protein was detected with 

an unchanging ratio (R ¥ 1), thus controlling for potential fragment- 
induced changes in protein expression. For e, f, h and i, data represent 
mean values + s.e.m. for at least three independent experiments. Statistical 
significance was calculated with unpaired Student’s t-tests comparing 
DMSO- to fragment-treated samples; ****P < 0.0001. 
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Extended Data Figure 7 | See next page for caption. 
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Extended Data Figure 7 | Fragment electrophiles that target 
pro-CASP8. a, Representative MS1 chromatograms for CASP8 tryptic 
peptide containing the catalytic cysteine C360 quantified by isoTOP-ABPP 
in cell lysates or cells treated with fragment 4 (250 1M, in vitro; 100 1M, 

in situ) and control fragment 21 (500M, in vitro; 200M, in situ). 

b, Neither 7 nor control fragment 62 (1001M each) inhibited 
recombinant, purified active CASP3 and CASP$8 assayed using 
N-acetyl-Asp-Glu-Val-Asp-7-amino-4-methylcoumarin (DEVD-AMC) 
and Ac-Ile-Glu-Thr-Asp-7-amino-4-trifluoromethylcoumarin 
(IETD-AFC) fluorogenic substrates, respectively. DEVD-CHO (201M) 
inhibited both caspases. c, Fragment reactivity with recombinant, purified 
active CASP8 added to cell lysates, where reactivity was detected in 
competition assays using the caspase activity probe Rho-DEVD-AOMK 
probe (2\1M, 1h). d, Western blot of proteomes from MDA-MB-231, 
Jurkat, and CASP8-null Jurkat proteomes showing that CASP8 was only 
found in the pro-enzyme form in these cells. e, Fragment reactivity with 
recombinant, purified pro-CASP8 (D374A, D384A, C409S) added to 

cell lysates to a final concentration of 1 |1M protein, where reactivity was 
detected in competition assays with the I[A-rhodamine probe (2 1M). 

Note that mutation of both Cys360 and Cys409 to Ser prevented labelling 
of pro-CASP8 by IA-rhodamine. f, Inactive control fragment 62 did not 
compete IA-rhodamine labelling of C360 of pro-CASP8. g, Apparent ICs» 
curve for blockade of [A-rhodamine labelling of pro-CASP8 (C409S) by 7. 
h, 7 (501M) fully competed IA-alkyne labelling of C360 of endogenous 
CASP8 in cell lysates as measured by isoTOP-ABPP. Representative MS1 
chromatograms are shown for the C360-containing peptide of CASP8. 

i, Concentration-dependent reactivity of click probe 61, with recombinant, 
purified pro-CASP8 (D374A, D384A) added to cell lysates to a final 
concentration of 11M protein. Note that pre-treatment with 7 blocked 

61 reactivity with pro-CASP8 and mutation of C360 to Ser prevented 
labelling of pro-CASP8 by 61 (25 |1M).). j, 7 (301M) blocked IA-alkyne 
labelling of C360 of pro-CASP8, but not active CASP8, as measured by 
isoTOP-ABPP. Recombinant pro- and active CASP8 were added to Ramos 


lysates at 141M and then treated with 7 (301M) followed by isoTOP-ABPP. 
k, Fragments 7 and 62 did not block labelling by Rno-DEVD-AOMK 
(21M) of recombinant, purified active CASP8 and active CASP3 added to 
MDA-MB-231 cell lysates to a final concentration of 1 |1M protein. 

1, 7 does not inhibit active caspases. Recombinant, active caspases 

were added to MDA-MD-231 lysate to a final concentration of 200 nM 
(CASP2, 3, 6, 7) or 111M (CASP8, 10), treated with z-Val-Ala-Asp(OMe)- 
fluoromethyl ketone (z- VAD-FMK) (251M) or 7 (501M), followed by 
labelling with the Rhno-DEVD-AOMK probe (21M). m, Representative 
MS1 chromatograms for tryptic peptides containing the catalytic cysteines 
of CASP8 (C360), CASP2 (C320), and CASP7 (C186) quantified by 
isoTOP-ABPP in Jurkat cell lysates treated with 7 or 62 (501M, 1h). 

n, 7, but not control fragment 62, blocked extrinsic, but not intrinsic 
apoptosis. Jurkat cells (1.5 million cells per ml) were incubated with 7 or 
62 (301M) or the pan-caspase inhibitor VAD-FMK (100M) for 30 min 
before addition of staurosporine (2 |1M) or SuperFasLigand (100ng ml1~'). 
Cells were incubated for 4h and viability was quantified with CellTiter- 
Glo. RLU, relative light unit. 0, For cells treated as described in n, cleavage 
of PARP (96 kDa), CASP8 (p43/p41, p18), and CASP3 (p17) was visualized 
by western blot. p, 7 protects Jurkat cells from extrinsic, but not intrinsic 
apoptosis. Cleavage of PARP, CASP8 and CASP3 detected by western 
blotting as shown in o was quantified for three (STS) or two (FasL) 
independent experiments. Cleavage products (PARP (96 kDa), CASP8 
(p43/p41), CASP3 (p17)) were quantified for compound treatment and 
the percentage cleavage relative to DMSO-treated samples was calculated. 
For b, g and n, data represent mean values + s.e.m. for at least three 
independent experiments. For p, STS data represent mean values + s.e.m. 
for three independent experiments, and FasL data represent mean 

values + s.d. for two independent experiments. Statistical significance was 
calculated with unpaired Student's t-tests comparing active compounds 
(VAD-FMK and 7) to control compound 62; **P < 0.01, ***P < 0.001, 
tee*P < 0/0001; 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


5 e Recombinant 
___CASP8__——_——sCASP10 < d IC, (95% Cl), uM 1 Caspase 
© Tae aR 1.04 Ti 
2 7 4.5(3.5-5.8 PJ 
kDa & 7 63-R ( |) 2 ME 63-R 
wee oe 8 mE DEVD-CHO 
30- = pro-CASP8 = 63-S >100 © . 
120 2 0.54 Background 
1.21.7 Ratio(R) 35. ™ pro-caspio «£100 . 6 
7 e é. - 7 62 63-R Fragment = - > ® 
— Zo 80 ina 
c a 61 = g 60 0.04 
Seca 7 é 63-R 54 + CASP10 
<25 D y) 220 
ays % P PY AEVD-AFC 
SAG SH VPP PP SoZ G PV Por a& Poh oF oh HM Oo} 0 -—~ 
2-10123 . £ FF 
Ld SESE EREEEE ES - pr Log [compound], iM I 5 © koa 
e CASP10 © 
“Ss + -eeee 2. ss ee SS le ' ~~ -60 
LON S anti-casp10 
: QO 
bs cl = 
f g : 9 YO ance SS 
e} Ph 
63-R IC, (95% Cl), 1M 2 
YH, Qe ET), wh SPP a Pad nS gF oo G uM h -S anti:gapdh a - 37 
-e CASP8 0.7 (0.5—0.8) 
-— CASP10 >100 SeSGee@ os Page SM j mm DMSO 
120 63-R © 63-R 0.7(0.5-0.8)| — FasL 
o 100 i SOP we Sd SPOS UM -- 63-S 14.5 (9.7 - 22)) 350 wf ie 
£080 pio: 120 fo 
OM” - Pro J 
2 560 > = CASP8 100 i % 04 sees 
one o 
a, 40 63-S 3 Te 5 
< 220 : boON 2 M 2 g 60 Bio 
o Ua 
eres 0 3 es oo oe oe & Gok or A at = = 40 5 7 
210123 DE -pro- $329 5 5 : 
Log [compound], uM Zz ef - - - sf - CASP8 < X& 
Oo} oO ort T T T T ToT 
61 210123 =a MEPs eas ~k NERO NORE RON 
ma DMSO é 88 €8 88 88 88 
k ma DMSO | ma CHX peg ipamppurd iit 2.30 +1 75 38 794m 
= eet FasL = ial 
5 “s 24 2 = CHX+Fasl ssce a oa a m 
Z pecs = 7 z Rs 
io . w 154 pee r 63-R puss r 63-R 
2164 = FasL — = + kDa FasL — + kDa 
g B10 4 T CASP10 — — ag cASPS e- -° -: es 
5107 5 . > fy 
8 T = 8 — -55 Se . 4 
® © | as 7 - -_- — 80 
£054 254 CASP8 RIPK = , 
E = yy 18 -* = -33 
| a : ? 
2SSO5@Q F 
07 T TT ew 1 ree! ie ACTIN ae 
SS Fe ee Se Se eee Hela 30 7.5 30 7.5 30 75 30 7.5um 
® 888888883 8388 
g DMSO 7 62 63-R 63-S 
ra 30 15 7.5 uM 


Extended Data Figure 8 | CASP10 is involved in intrinsic apoptosis in 
primary human T cells. a, Representative MS1 peptide signals showing 

R values for caspases detected by quantitative proteomics using probe 61. 
ABPP-SILAC experiments. Jurkat cells (10 million cells) were treated 

with either DMSO (heavy cells) or the indicated compounds (light cells) 
for 2h followed by probe 61 (104M, 1h). b, 7 blocked 61 labelling of 
pro-CASP8 and CASP10, whereas 63-R selectively blocked probe labelling 
of pro-CASP8. ¢, 7, but not 63-R blocked probe labelling of pro-CASP10. 
Recombinant pro-CASP10 was added to MDA-MB-231 lysates to a final 
concentration of 300 nM, treated with the indicated compounds, and 
labelled with probe 61. Mutation of the catalytic cysteine C401A fully 
prevented labelling by 61. d, Apparent ICso curve for blockade of 61 
labelling of pro-CASP10 by 7, 63-R or 63-S. e, Neither 7 nor 63-R 

(25M each) inhibited the activity of recombinant, purified active 
CASP10 (500 nM), which was assayed after addition of the protein to 
MDA-MB-231 lysate using fluorometric Ac-Ala-Glu-Val-Asp-7-amino-4- 
methylcoumarin (AEVD-AMC) substrate. DEVD-CHO (201M) inhibited 
the activity of CASP10. f, Apparent ICs9 curve for blockade of 61 labelling 
of pro-CASP8 and pro-CASP10 by 63-R. g, 63-R shows increased potency 
against pro-CASP8. Recombinant pro-CASP8 was added to MDA-MB-231 


lysates to a final concentration of 300 nM, treated with the indicated 
compounds and labelled with probe 61. h, Apparent ICs9 curve for 
blockade of 61 labelling of pro-CASP8 by 63-R compared with 63-S. The 
structure of 63-S is shown. i, CASP10 is more highly expressed in primary 
human T cells compared to Jurkat cells. Western blot analysis of full-length 
CASP10, CASP8 and GAPDH expression levels in Jurkat and T-cell 

lysates (2 mg ml~?). j, Jurkat cells (150,000 cells per well) were incubated 
with 7 or 63-R at the indicated concentrations for 30 min before addition 
of staurosporine (21M) or SuperFasLigand (100 ng ml~'). Cells were 
incubated for 4h and viability was quantified with CellTiter-Glo (CTG). 

k, Jurkat cells treated as in j, but with 63-R or 63-S. 1, HeLa cells 

(20,000 cells per well) were seeded and 24h later treated with the 

indicated compounds for 30 min before the addition of SuperFasLigand 
(100 ng m1!) and cycloheximide (CHX; 2.5 ng ml~'). Cells were incubated 
for 6h and viability was quantified with CTG. m, For T cells treated as in 
Fig. 4d cleavage of CASP10 (p22), CASP8 (p18), CASP3 (p17) and RIPK 
(33 kDa) was visualized by western blotting. For d-f, h and j-k, data 
represent mean values + s.e.m. for at least three independent experiments. 
Statistical significance was calculated with unpaired Student's t-tests 
comparing DMSO- to fragment-treated samples; **P < 0.01, ****P < 0.0001. 
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Extended Data Table 1 | Ligandability of representative cysteines and proteins 


a. 
. Other cysteines Previous 2 
Protein Farcinda Fragment(s) quantified by isoTOP- covalent rican Reference 
Stein’ ABPP inhibitor(s) 
BTIK C481 2, 3, 14, 31 C145, C337 Ibrutinib Active site ‘ 
C10, C27, C230, C269, 
TGM2 C277 12, 14, 32 C290, C336, C370, C524, 18d Active Site 2 
C545, C620 
MAP2K7 C131 oe a mel C260, C280 Ibrutinib Active Site 29 
2, 3, 5, 14, 24, 31, C34, C119, C164, C199, Non-active 46 
XPO1 C528 43, 56 327, C498, C723, C1070 eri site 
Z-WEHD- 47 
cers C315 a8 - CHO/FMK Active Site 
Z-VAD-FMK, Active Site 48,49 
CASP8 C360 2,4,11 C236, C409 CV8/9-AOMK 
ERCC3 C342 2, 3, 5, 8, 14, 21 = Triptolide Active Site me 
PARK7 
(Toxoplasma C106 2,9, 8, 11, 13, 48, , C46, C53 WRR-086 Active Site et 
45, 50, 52 
DJ-1) 
2-13, 16, 18-22, 33, 
27-30, 32-34, 36, 39, stn 52 
GSTO1 C32 43, 49, 50, 52, 54, C90, C192, C237 KT53 Active Site 
55 
3, 8-10,12, 27, 28, 
ALDH2 C319 32, 39, 40, 43, 49, C66, C179, C386, C472 Disulfiram Active Site " 
50 
C89, C126, C132, C154, Se 64 
CTSZ C92 4, 11, 20, 28, 32 C170, C173, C179, C214 Cy5DCG04 Active Site 
b. 
M+H M+H 
Protein Cysteine Fragment # Peptide calculated observed Charge 
(miz) (miz) 
IMPDH2 C140 14 R.HGFCGIPITDTGR.M 715.86 715.86 2 
TIGAR C114 5 R.EECPVFTPPGGETLDQVK.M 1123.97 1123.97 2 
IDH1 C269 20 K.SEGGFIWACK.N 702.84 702.84 2 
CASP8 C360 7 K.VFFIQACQGDNYQK.G 660.98 660.98 3 
1195.58 (light) 1195.58 (light) 
CASP8* C360 61-TEV-Tag K.VFFIQACQGDNYQK.G and 1198.59 and 1198.59 2 
(heavy) (heavy) 


a, Representative cysteines with known covalent ligands targeted by fragment electrophiles in isoTOP-ABPP experiments. b, Site of fragment labelling for recombinant proteins. Underlines mark the 
fragment-modified cysteines. *Measured for endogenous protein by isoTOP-ABPP using probe 61. 


© 2016 Macmillan Publishers Limited. All rights reserved 


Extended Data Table 2 | Reactive docking results for liganded cysteines 
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Most ligandable 


F . : - . Most ligandable cysteine Heat 
Protein PDB ID: as by Cysteine location by isoTOP-ABPP Match Sensitive 
Aldh2 1005 C319 Active site C319 Yes Yes 
BTK 1K2P C481 Active site C481 Yes ND 
CASP8 1Q1TN C360 Active Site C360 Yes Yes 
CCNB1 2JGZ C238 Non-active site C238 Yes Yes 
CDKN3 1FQ1 C39 Non-active site C39 Yes Yes 
CLIC4 2AEH C35 Non-active site C35 Yes Yes 
DTYMK 1E2G C163 Non-active site C163 Yes No 
IDH1 3MAP C269 Non-active site C269 Yes Yes 
IMPDH2 1NF7 C331 Active site C331, C140 Yes Yes 
GLRX5 2WUL C67 Active site C67 Yes No 
GSTO1 1EEM C32 Active site C32 Yes Yes 
NME3 1ZS6 C158 Non-active site C158 Yes Yes 
PKM 4JPG C423 Non-active site C423 Yes Yes 
SRC 2SRC C277 Active Site C277 Yes ND 
TIGAR 3DCY C114 Non-active site C114,C161 Yes Yes 
TXNDC 1WOU C43 Active site C43 Yes Yes 
UGDH 3ITK C276 Active site C276 Yes Yes 
UPP1 3EUF C162 Non-active site C162 Yes Yes 
XPO1 3GB8 C528 Non-active site C528 Yes Yes 
CDK5 1UNG C157 Non-active site C269 Second Yes 
EDC3 3D3K C311 Non-active site C137, C413, C499 Second ND 
NR2F2 3CJW C213 Non-active site C326, C213(in situ) Second ND 
PDCD6IP 2RO02 C231 Non-active site C90 Second ND 
PRMT1 10RI C285 Active site C109 Second Yes 
UBE2S 1ZDN C118 Non-active site C95 Second ND 
FNBP1 2EFL C145 Non-active site c70 No ND 
HAT1 2POW C120 Non-active site C101 No Yes 
MAPK9 3NPC C163 Active site C177 No ND 
STAT1 1YVL C543 Non-active site C492, C255 No ND 


Shows the most ligandable cysteine predicted by reactive docking. Match indicates whether the top cysteine by docking matches that identified by isoTOP-ABPP. Heat sensitive column indicates 
whether the top cysteine identified by covalent docking is sensitive to heat denaturation. ND, not detected. 
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Structural basis of N°-adenosine methylation by the 
METTL3-METTLI14 complex 


Xiang Wang!*, Jing Feng'*, Yuan Xue!*, Zeyuan Guan!, Delin Zhang!, Zhu Liu’, Zhou Gong*, Qiang Wang!, Jinbo Huang’, 


Chun Tang*’, Tingting Zou! & Ping Yin! 


Chemical modifications of RNA have essential roles in a vast 
range of cellular processes'~*. N°-methyladenosine (m°A) is an 
abundant internal modification in messenger RNA and long non- 
coding RNA that can be dynamically added and removed by RNA 
methyltransferases (MTases) and demethylases, respectively”*. An 
MTase complex comprising methyltransferase-like 3 (METTL3) 
and methyltransferase-like 14 (METTL14) efficiently catalyses 
methyl group transfer®’. In contrast to the well-studied DNA 
MTase®, the exact roles of these two RNA MTases in the complex 
remain to be elucidated. Here we report the crystal structures of 
the METTL3-METTLI14 heterodimer with MTase domains in 
the ligand-free, S-adenosyl methionine (AdoMet)-bound and 
S-adenosyl homocysteine (AdoHcy)-bound states, with resolutions 
of 1.9, 1.71 and 1.61 A, respectively. Both METTL3 and METTL14 
adopt a class I MTase fold and they interact with each other via 
an extensive hydrogen bonding network, generating a positively 
charged groove. Notably, AdoMet was observed in only the 
METTL3 pocket and not in METTL14. Combined with biochemical 
analysis, these results suggest that in the m°A MTase complex, 
METTLS3 primarily functions as the catalytic core, while METTL14 
serves as an RNA-binding platform, reminiscent of the target 
recognition domain of DNA N®-adenine MTase”””. This structural 
information provides an important framework for the functional 
investigation of m°A. 

N®-methyladenosine is a prevalent RNA modification in species 
including viruses, bacteria'!, yeasts!”, plants}? and mammals!*!», It 
functions in multiple aspects of developmental regulation’®, the cell 
cycle!’, fate determination!*!°, and the heat shock stress response”” by 
affecting aspects of RNA metabolism such as pre-mRNA processing”’, 
translation efficiency**”’, transcript stability”4 and miRNA biogenesis”. 
Three distinct classes of protein factor are involved in the function of 
the m°A modification*”: ‘writers’ (adenosine MTases)®”!®”, ‘eras- 
ers’ (m°A-demethylating enzymes)*””* and ‘readers’ (m°A-binding 
proteins)**”°. The writers and erasers reversibly install and remove 
this modification, respectively, thereby generating a dynamic m°A 
landscape’. The readers, known as the YTH domain family”*”?, bind 
selectively to the m°A-containing sequence and contribute to the 
determination of RNA fate. Although the erasers and readers have 
been well characterized, the lack of structures of writers remains a 
major obstacle to the elucidation of the versatile functions of m°A. 

In humans, two MTases, METTL3 (also known as MT-A70) and 
METTL14 participate in this modification as ‘writers’®’. Sequence 
analysis indicates that both proteins belong to the class I MTase 
family*? (Extended Data Fig. 1) and they forma core catalytic complex 
that is regulated by an additional subunit, Wilms’ tumour 1-associating 
protein (WTAP)”!®?%, Individually, METTL3 and METTL14 
exhibit comparable weak MTase activity in vitro. However, the 


METTL3-METTL14 complex has much higher catalytic activity®”. 
The mechanism by which the MTases functions synergistically awaits 
structural investigation. 

To elucidate the mechanism of m°A methylation by the METTL3- 
METTL14 complex, we determined the crystal structure of the 
core METTL3-METTL14 complex comprising the MTase domain 
(METTL3, residues 369-570; METTL14, residues117-402) (Fig. 1a) 
in the space group P4,2)2, using bromide-based single-wavelength 
anomalous diffraction, at a refined resolution of 1.9A (Extended Data 
Table 1). Additionally, residues 200-204 in METTL14 could not be 
modelled. In the crystal, one METTL3 molecule and one METTL14 
molecule form an antiparallel heterodimer in the asymmetric unit, 
resulting in an overall butterfly appearance with a width of approxi- 
mately 40 A anda length of approximately 70 A (Fig. 1b). 

We traced the METTL3 MTase domain and the METTL14 MTase 
domain, which is adjacent to the N-terminal «-helical motif (NHM) 
and to the C-terminal motif (CTM) with an phosphoserine at posi- 
tion 399 (Fig. 1b). The NHM extends across the MTase domain of 
METTL14 and to the MTase domain of METTL3. The MTase domain 
of METTL3 adopts a classic a-3-a sandwich fold comprising a mixed 
eight-stranded B-sheet with a strand order of 317, 887, 877, 827, B3T, 
65, 847 and 367 flanked by four a-helices (a1, a2 and a4 on one 
side, and «3 on the other side) and three 319 helices (Fig. 1c). The 
overall structure of the METTL3 MTase domain primarily resembles 
that of the class I DNA N®-adenine MTase®*!?”° (Fig. 1c and Extended 
Data Fig. 2a). Nevertheless, the MTase domains of METTL3 and 
METTL14 lack an additional element similar to the target recognition 
domain (TRD) of DNA N®-adenine MTase, which functions as the 
substrate-binding scaffold®. Consistent with the 22% sequence identity 
between the MTase domains of METTL3 and METTL14 (Extended 
Data Fig. 1), the structures of the two domains were superimposed 
with a root-mean-squared deviation (r.m.s.d.) of approximately 0.71 A 
over 211 Ca atoms, excluding the NHM and CTM domains (Extended 
Data Fig. 2b). Interestingly, three loops with lower sequence simi- 
larities exhibited distinct conformations (Extended Data Figs 1, 2b), 
suggesting that they have different functions. We refer to these loops 
as gate loop 1 (residues 396-410 in METTL3), interface loop (residues 
462-479 in METTL3) and gate loop 2 (residues 507-515 in METTL3; 
Fig. 1c). 

METTL3 and METTL14 adopt the conformation of a tight asym- 
metric heterodimer with an interface area of approximately 2,500 A? 
and engage in extensive water-mediated hydrogen-bond interactions 
with each other. These interactions are mediated by interfaces 1 and 2 
(Extended Data Fig. 3a). Interface 1 primarily comprises the inter- 
face loop (residues 462-479) of METTL3, NHM, and the long loop 
connecting 35 and 86 of METTL14 (Extended Data Fig. 3b, left and 
middle). Additionally, R471 of METTL3 interacts with the C-terminal 
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Figure 1 | Structural overview of the METTL3-METTL14 complex. 

a, Schematic domain structures of METTL3 and METTL14. METTL3 
MTase (residues 369-580), METTL14 NHM (residues 116-163), 

MTase domain (residues 165-378), and CTM (residues 380-402) are 
magenta, cyan, green, and yellow, respectively. b, Overall structure of the 
METTL3-METTL14 heterodimer. Residue S399 (red stick) represents the 
phosphorylation modification. c, The METTL3 (magenta) structure in two 
perpendicular views. Gate loops 1 and 2 are orange, and the interface loop 
is blue. METTL14 has been removed for clarity. All structure figures were 
prepared using PyMOL. 


phosphorylated $399 of METTL14 via a salt bridge, confirming the 
important regulatory role of phosphorylation. Interface 2 contains 
helix a2 (residues 438-447) and strand 64 (residues 450-460) of 
METTL3 and the corresponding helix a2, strand 34 and an inter- 
face loop (residues 266-284) from METTL14 (Extended Data Fig. 3b, 
right). These interfaces allow the two MTases to bind each other 
tightly, and the extensive interaction networks are difficult to disrupt. 

After extensive trials, we determined the structure of METTL3- 
METTL14 in complex with AdoMet using a soaking approach and 
refined the structure to 1.71 A resolution (Extended Data Table 1). 
In the crystal, one METTL3-METTL14 heterodimer was present in 
each asymmetric unit (Fig. 2a). Following assignment of most amino 
acids into the electron density map, electron densities indicative of 
one AdoMet became clearly visible in METTL3 (Fig. 2b), whereas no 
additional electron density was observed in METTL14 (Extended Data 
Fig. 4). The AdoMet molecule is positioned at the end of 61, 87 and 88 
(Fig. 2a). The AdoMet-binding site faces the most conserved DPPW 
motif (residues 395-399) of gate loop 1 (Fig. 2a, b). This orientation 
suggests a nucleophilic attack methyl-transfer mechanism, reminiscent 
of the DNA N®-adenine MTase’”. 

The AdoMet molecule is primarily coordinated by eleven residues of 
METTLS3 via extensive hydrogen bonds (Fig. 2c). The adenine moiety 
of AdoMet is recognized by the side chain of D377 and the main chain 
of 1378. The hydroxyl groups of ribose are surrounded by Q550, N549 
and R536. Several residues, including D395, K513, H538 and N539, 
contact the methionine moiety of AdoMet directly, while E532 and 
L533 form water-mediated interactions with AdoMet (Fig. 2c). The 
importance of these residues in AdoMet coordination is supported by 
mutational analysis. D377A, D395A, N539A and E532A mutations 
completely abolished enzyme activity, while substitutions of R536, 
H538, N549 or Q550 with alanine moderately reduced enzyme activity 
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Figure 2 | AdoMet is coordinated by METTL3 in the binary complex. 
a, Ribbon representation of METTL3-METTLI4 in complex with 
AdoMet. The bound AdoMet is illustrated as a green ball-and-stick 
representation. The DPPW motif is shown as an orange stick surrounded 
by a dashed line. b, Close-up view of the AdoMet binding site and 
DPPW motif of METTL3 showing the electron density (blue) of AdoMet 
contoured at lo. c, Schematic representation of the interactions between 
METTL3 and AdoMet. Residues are shown with sticks. The side-chain 
interactions and the main-chain interactions are shown in black and blue 
letters, respectively. Water is shown as a red ball representation. AdoMet 
is shown as green sticks. Hydrogen bonds are represented as red dashed 
lines. 


(Extended Data Fig. 5a). Neither the D377A nor the D395A mutant 
had detectable AdoMet-binding activity, as measured by isothermal 
titration calorimetry (ITC), compared to the wild-type complex, which 
exhibited a dissociation constant of approximately 1.5 |1M and binding 
stoichiometry (N) of approximately 1.15 (Extended Data Fig. 5b, c). 

Analysis of the surface electrostatic potential of the AdoMet-bound 
complex revealed a positively charged groove between METTL3 and 
METTL14 adjacent to the AdoMet (Fig. 3a). This groove consists of 
at least ten positively charged residues: R465, R471, H474 and H478 
from the interface loop of METTL3 and R245, R249, R254, R255, K297 
and R298 from METTL14 (Fig. 3b). We speculated that this groove 
might be responsible for RNA binding. To test this hypothesis, we 
first replaced the interface loop (residues 462-479) of METTL3 with 
six alanine amino acids. Compared to the wild-type complex, this 
mutant exhibited weaker RNA binding activity and reduced MTase 
activity, but no effect on AdoMet binding was observed (Fig. 3c and 
Extended Data Fig. 6a, b). A similar result was obtained when the 
positively charged residues in METTL14 were mutated (Fig. 3c and 
Extended Data Fig. 6). These results suggest that the positively charged 
groove formed by METTL3 and METTL14 contributes to internal 
RNA binding. 

The 1.61 A structure of METTL3-METTL14 in complex with 
AdoHcy was also determined (Extended Data Table 1). In this structure, 
one AdoHcy molecule is positioned in the AdoMet-binding pocket of 
METTL3. AdoHcy adopts a configuration nearly identical to that of 
AdoMet, except for the ribose (Extended Data Fig. 7a, b). The over- 
all structures of the ligand-free, AdoMet-bound and AdoHcy-bound 
METTL3-METTL14 complexes are nearly identical, with an r.m.s.d. of 
0.24A over 438 Ca atoms (ligand-free with AdoMet-bound) and 0.12 A 
over 454 Ca atoms (AdoHcy-bound with AdoMet-bound; Fig. 4a). 
The structural similarities between the ligand-free and ligand-bound 
METTL3-METTL14 complexes were also confirmed by small-angle 
X-ray scattering measurements in solution (Extended Data Fig. 7c). 
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Figure 3 | Potential RNA-binding groove in the METTL3-METTL14 
complex. a, Two views of the surface electrostatic potential calculated 
with PyMOL. The gate loops and the interface loop of METTL3 are 
highlighted by green dashed ellipses and a magenta dashed ellipse, 
respectively (left). The potential RNA-binding grove is encircled by a 
yellow dashed line (right). b, AdoMet in a space-filling representation. 
Residues from METTL3 and METTL14 are magenta and cyan, 
respectively. c, Measurement of the MTase activity of mutants of the 
putative RNA-binding groove of the METTL3-METTL14 complex. 
The indicated mutations were introduced to METTL3 (Loop to 6A) or 
METTLI14 (R245E, R254E & R255E, K297E & R298E). The error bars 
represent the s.e.m. of three independent measurements. 


Close inspection of the structures revealed that gate loop 1 and gate 
loop 2, which are adjacent to the AdoMet binding site, displayed large 
conformational changes upon ligand binding (Fig. 4a, b). Gate loop 1 
is flipped outwards in the AdoMet-bound state compared with the 
AdoHcy-bound state, and the ligand-free state adopts the same fold 
as the AdoHcy-bound state (Fig. 4b). Likewise, gate loop 2 undergoes 
a significant conformational rearrangement upon binding of AdoHcy 
or AdoMet, resulting in closure of the binding pocket (Fig. 4a, b). 
This rearrangement is reminiscent of the interaction of loops 1 and 
2 of M.TaqI with its DNA substrate (Extended Data Fig. 8a) and 
suggests that gate loops 1 and 2 have important roles in adenosine 
recognition’”. 

The METTL3-METTL14 complex displayed much higher catalytic 
activity than either METTL3 or METTL14 alone (Extended Data 
Fig. 5a). This result suggests that MET'TL14 enhances the MTase 
activity of METTL3 via RNA binding, vice versa, or both. Our struc- 
tures suggest that the primary function of METTL14 is not to catalyse 
methyl-group transfer but to offer an RNA-binding scaffold (Fig. 4c), 
similar to the TRD of DNA MTases®-!° (Extended Data Fig. 8a, b). 
No positively charged area was observed near the potential AdoMet- 
binding pocket of METTL14 (Extended Data Fig. 8c). Most impor- 
tantly, although most of the residues involved in AdoMet binding 
are conserved between METTL3 and METTL14 (Extended Data 
Figs 1, 8d), mutations of two key residues in METTL14 had little effect 
on the AdoMet-binding and MTase activities of the binary complex 
(Extended Data Fig. 8e, f). Consistent with this result, AdoMet was 
observed in only the METTL3 pocket in the crystal structure (Fig. 2). 
Nevertheless, we cannot exclude the possibility that METTL14 pos- 
sesses MTase activity under certain conditions. 

A recent study identified a METTL3 protein interaction network 
comprising additional components such as METTL14, WTAP and 
KIAA1429 (refs 7, 16, 26). Perturbation of these factors alters global 
m°A levels, resulting in epitranscriptomic changes*”. Additionally, 
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Figure 4 | Proposed working model of the METTL3-METTL14 
complex. a, Structural superimposition of the ligand-free (orange), 
AdoMet-bound (green) and AdoHcy-bound (cyan) METTL3-METTL14 
complexes shows the conformational changes in gate loops 1 and 2. 
AdoMet is show as green balls-and-sticks. b, Close-up view of gate loops 

1 and 2. The conformational change is highlighted with a blue dashed 
arrow. ¢, Proposed working model for m°A modification by the METTL3- 
METTL14 complex. METTL3 (magenta) primarily functions as a catalytic 
core, and METTL14 (cyan) serves as an RNA-binding scaffold. The 
substrate RNA (magenta ribbon) is cooperatively coordinated by METTL3 
and METTL14. The adenine base (black) points to the AdoMet binding 
site in METTL3 surrounded by the two gate loops (green). 
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the complex exhibited a substrate sequence preference (Extended 
Data Fig. 9). Further biochemical and structural characterization of 
the m®°A writer complex containing regulatory factors, substrate RNA 
or both is required to completely elucidate the molecular basis of m°A 
modification. The structures reported here provide unprecedented 
mechanistic insight into m°A RNA methylation and new opportunities 
for the development of therapeutic agents, and serve as an important 
foundation for understanding m°A epitranscriptomics. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized and the investigators were not blinded to allocation during 
experiments and outcome assessment. 

METTL3 and METTL14 construction, expression and purification. The METTL3 
and METTL14 genes were amplified from a Homo sapiens cDNA library using the 
following primer pairs: METTL3-M1-F (5‘-ATGTCGGACACGTGGAGCT-3’) 
and M3ETTL3-L580-R (5/-CTATAAATTCTTAGGTTTAGAGATGATAC-3’); and 
METTL14-M1-F (5’/-GATAGCCGCTTGCAGGAGATCCG-3’) and METTL14- 
R456-R (5/-TCGAGGTGGAAAGCCACCTCTG-3’), and cloned into the T-vector. 
Both gene strands were verified by sequencing. The full length of genes METTL3 
and METTL14 was subcloned into a modified pFastBac1 vector with a His affinity 
tag fused to the N terminus. Bacmids were generated in DH10Bac cells following 
the instructions for the Bac-to-Bac baculovirus expression system (Invitrogen), 
and baculoviruses were generated and amplified in Sf-9 insect cells. For protein 
expression and purification, High Five (Trichoplusia ni) insect cells were grown 
in SIM HF medium (Sino Biological Inc.) supplemented with L-glutamine. The 
METTL3-METTL14 complex was co-expressed in High Five insect cells at 27°C 
for 72h using the METTL3 and METTLI14 viruses. All complex mutants were 
co-expressed using a mutant virus and a wild-type partner virus. Cells were har- 
vested by centrifugation at 2,000g for 15 min and homogenized in ice-cold lysis 
buffer containing 25 mM Tris-HCl, pH 8.0, 150 mM NaCl and 0.5 mM phenyl- 
methanesulfonyl-fluoride (PMSF). The cells were disrupted using a cell homoge- 
nizer. The insoluble fraction was precipitated by ultracentrifugation (20,000g) for 
lhat4°C. The supernatant was loaded onto a Ni-NTA superflow affinity column 
(Qiagen) and washed three times with lysis buffer plus 10 mM imidazole. Elution 
was performed in buffer containing 25 mM Tris-HCl, pH 8.0, and 250 mM imi- 
dazole. The protein was further purified using anion-exchange chromatography 
(Source 15Q, GE Healthcare). The purified complex was concentrated to approx- 
imately 20 mg ml! (Amicon 30-kDa cutoff, Millipore), and if for crystallization, 
it was digested with chymotrypsin (0.5 mg ml!) at room temperature for 30 min. 
The undigested or digested protein was subjected to size-exclusion chromatogra- 
phy (Superdex-200 Increase 10/300, GE Healthcare). The buffer used for size-ex- 
clusion chromatography contained 25 mM Tris-HCl, pH 8.0, 150 mM NaCl and 
5mM dithiothreitol (DTT). The peak fractions of the METTL3-METTL14 com- 
plex were pooled and immediately used for crystallization. 

Crystallization. Crystallization experiments were performed using the 
sitting-drop vapour diffusion method at 18°C by mixing equal volumes (111) of 
protein (15mg ml‘) and reservoir solution. After several rounds of optimization, 
good-quality crystals appeared after several days and grew as a thin diamond to 
full size within 15 days in drops containing 18% (v/v) PEG 8000 (Sigma) and 
0.1 M sodium citrate, pH 5.7. The crystals were flash-frozen in liquid nitrogen 
and cryoprotected by adding ethylene glycol to a final concentration of 20%. The 
crystals were diffracted to 1.9 A at the Shanghai Synchrotron Radiation Facility on 
beamlines BL17U and BL19U. To obtain phase information, high-quality crystals 
were immersed in cryoprotectant solution plus 0.3 M NaBr for 10 min. Before the 
crystals were harvested, 1 ,1l of a solution containing 25 mM Tris, pH 8.0, 150 mM 
NaCl, 20% (v/v) ethylene glycol and 0.1 M NaBr was added. The crystals were 
immediately transferred to a new solution containing 25 mM Tris, pH 8.0, 150 mM 
NaCl, ~18% (v/v) ethylene glycol and 0.3 M NaBr. Finally, the bromide-soaked 
crystals were diffracted to 2.6 A resolution. 

To obtain the AdoMet-bound and AdoHcy-bound structures, we performed 
extensive trials. We initially failed to obtain diffracting crystals by co-crystallization 
of METTL3-METTL14 with ligand. We then systematically soaked high-quality 
diffracting crystals of METTL3-METTL14 with AdoMet or AdoHcy (Sigma). 
Native crystals of METTL3-METTL14 were obtained after at least 15 days of incu- 
bation at 18°C. The crystals were soaked with a series of ligand concentrations in 
the presence of 3mM ATP as an additive. The final concentrations of ligand used 
were 0.5, 1, 2 and 5mM. The crystals were soaked for 30 min to 72h, depending 
on their survival in the soaking solution. The crystals were examined under a 
microscope every 30 min. If the crystals appeared damaged, they were transferred 
to a cryoprotectant solution containing 25 mM Tris pH 8.0, 150mM NaCl and 18% 
(v/v) ethylene glycol. The crystals were collected and immediately flash-frozen in 
liquid nitrogen. 

Data collection and structure determination. All diffraction data were collected 
at the Shanghai Synchrotron Radiation Facility (SSRF) on beamlines BL17U or 
BL19U using a CCD detector cooled to 100K. The data from the METTL3- 
METTL14 crystals were processed with the HKL2000 program suite and XDS 
packages*!. Further processing was performed using the programs from the CCP4 
suite’. The ligand-free METTL3-METTL14 structure was solved via single anom- 
alous diffraction (SAD) of bromide using the ShelxC/D/E program’. Then, a crude 
model was manually built in the Coot program™, The P4,2;2 crystal forms of the 
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AdoMet-bound and AdoHcy-bound complexes were solved by molecular replace- 
ment with PHASER using the structure of the ligand-free METTL3-METTL14 
complex as the initial searching model*. All four crystal structures were built using 
Coot and refined using the Phenix program”. The data collection and structure 
refinement statistics are summarized in Extended Data Table 1. All figures repre- 
senting structures were prepared with PYMOL”. 

m°A methylation assay. A 5‘-GGACUGGACUGGACUGGACU-3’ RNA probe 
containing four repeats of the canonical RRACH sequence was synthesized in vitro 
(Takara). Before reaction, the proteins were subjected to size-exclusion chromatog- 
raphy (Superdex-200 Increase 10/300, GE Healthcare) and the running buffer con- 
tains 15 mM HEPES pH 7.3, 150mM NaCl, 5mM MgCh, and 5 mM dithiothreitol 
(DTT). The 50-1] reaction mixture contained 15 mM HEPES pH 7.3, 50mM KCl, 
50mM NaCl, 1mM MgCh, 1 mM dithiothreitol, 4% glycerol, 0.04\1Ci [methyl- 
3H] AdoMet (PerkinElmer), 2nM RNA probe, and 250 ng purified protein. The 
solution was incubated at 30°C for 1h. A 5’/-GGGCUGGGCUGGGCUGGGCU-3’ 
RNA probe without adenine was used as a negative control. This reaction was 
quenched with 500 il of 1:1 (v/v) Tris-phenol (pH 8.0):chloroform, followed by the 
addition of 450 11 double-distilled (dd)H20. Then, the solution was centrifuged at 
20,000g for 15 min. The supernatant was removed to a new tube and precipitated 
using an equal volume of isopropanol and 501g yeast tRNA at —20°C for 1h. The 
precipitated RNA was dissolved in 10011 ddH20. The products were confirmed 
by immunoblotting using the commercial m°A antibody (Synaptic Systems, cat- 
alogue number 202 003, 1:3,000). The counts per minute (c.p.m.) of the RNA 
was measured in a scintillation counter (1450 MicroBeta Trilux, PerkinElmer). 
All experiments were repeated three times for each measurement. The average 
(+s.e.m.) c.p.m. was determined from three independent experiments. 
Isothermal titration calorimetry (ITC) assays. ITC experiments for the binding 
of AdoMet to the METTL3-METTL14 complex were performed at 25°C using 
Auto-iTC100 titration calorimetry (MicroCal). AdoMet (200 1M) was dissolved in 
reaction buffer containing 20 mM Tris-HCl, pH 8.0, and 150 mM NaCl (40,11) and 
titrated against 201M wild-type or mutant METTL3-METTL14 complex (200 il) 
in the same buffer. The first injection (0.511) was followed by 19 injections of 2 l. 
The heat of dilution values for AdoMet were measured by injecting AdoMet into 
buffer alone. The values were subtracted from the experimental curves before data 
analysis. The stirring rate was 750 r.p.m. The MicroCal ORIGIN software supplied 
with the instrument was used to determine the site-binding model that produced 
a good fit (low x 2 value) for the resulting data. 

Electrophoretic mobility shift assay (EMSA). The ssRNA oligonucleotide 
5'‘-GGACUGGACUGGACUGGACU-;3’ was radiolabelled at the 5’ end with 
[y-°P]ATP (PerkinElmer), catalysed by T4 polynucleotide kinase (Takara). In 
addition, the RNA was purified by centrifugation through a 2-cm bed of G-25 
size exclusion resin packed in a mini-spin column (GE Healthcare) and centri- 
fuged at 750g for 2 min. For EMSA, proteins were incubated with approximately 
10nM *P-labelled probe in final binding reactions containing 20 mM HEPES, 
pH 7.0, 5mM MgCl, 40 mM NaCl, 1.5|1M yeast tRNA and 10% glycerol for 
20 min at 25°C. The reactions were then resolved on 6% native acrylamide gels 
(37.5:1 acrylamide:bis-acrylamide) in 0.5 x Tris-glycine buffer under an electric 
field of 15 Vcm for 1h. Gels were visualized on a phosphor screen (Amersham 
Biosciences) using a Typhoon Trio Imager (Amersham Biosciences). 
Small-angle X-ray scattering (SAXS) measurement. Solution SAXS data were 
collected at the National Center for Protein Science Shanghai using the BL19U2 
beamline at 18°C. The complex proteins for SAXS measurement were prepared at 
30\.M in buffer containing 25 mM HEPES, pH 7.0, 150mM NaCl without ligand 
or in the presence of equimolar AdoMe or AdoHcy. For each measurement, 
20 consecutive frames of 1-s exposure time were recorded and were averaged after 
checking there was no difference between the first and last frames of the SAXS data. 
Similarly, the background data were recorded using the sample buffer and were 
subtracted from the protein patterns. 
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Extended Data Figure 1 | Sequence alignment of human METTL3 and 
METTLI4. Sequence alignment of Homo sapiens METTL3 (UniProt 
accession Q86U44) and METTL14 (UniProt accession QOHCE5S). The 
alignment was generated using the MultAlin and ENDscript programs. 
Secondary structural elements are shown above. Sequence identity is 
shown in white letters with a red background, and sequence similarity is 
shown in red letters. The coloured dots highlight functionally important 
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Protein-Interaction residues 
AdoMet-Interaction residues 
Positively charged residues for RNA binding 


positions. Residues of METTL3 and METTL14 that are involved in 
protein interactions are indicated by magenta and green dots, respectively. 
Cyan dots indicate residues that interact with AdoMet that were 

analysed by mutagenesis in this study. Blue dots represent residues that 
compose the RNA binding groove. The dots at the top and bottom of the 
sequences indicate residues from METTL3 and METTLI4, respectively. 
Phosphoserine is highlighted by a red arrow. 
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METTL3 


METTL3 


METTL14 


Extended Data Figure 2 | The MTase domains of METTL3 and METTL14 
adopt the class I MTase fold. a, Diagram of the METTL3-METTL14 
secondary structure profiles. METTL3 (magenta) and METTL14 (green) are 
boxed with a light teal background and a wheat background, respectively. 
The MTase domain contains an eight-stranded 6-sheet (triangles) flanked 
by four a-helices and three 3}9-helices (circles). Structural elements are 
numbered by their linear order in the sequence. The loops in the front 


LETTER 


METTL14 


are indicated by black lines, and loops in the back are indicated by black 
dashed lines. b, Structural comparison of METTL3 and METTL14. Two 
perpendicular views of superimposed METTL3 and METTL14 coloured 
magenta and green, respectively. The NHM and CTM of METTL14 are 
coloured cyan and yellow, respectively. The main differences between the 
MTase domains of METTL3 and METTL14 are the two gate loops (orange) 
and the interface loop (blue). 
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of METTL3 (blue) primarily contributes to the heterodimer interaction. 


METTL3 and METTLI4. a, The main interface of the METTL3- b, Details of interfaces 1 and 2. Water is shown as a red ball. Hydrogen 
METTLI14 heterodimer comprises interface 1 (boxed with orange and bonds are represented by red dashed lines. Residues from METTL3 
green rectangles) and interface 2 (boxed with a cyan rectangle), which (magenta) and METTL14 (green) that are involved in interactions are 
generate an extensive water-mediated hydrogen network. METTL3 and shown as sticks. 


METTLI14 are coloured wheat and silver, respectively. The interface loop 
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Extended Data Figure 4 | One AdoMet was located at the AdoMet 
binding site of METTL3. a, Lattice packing of AdoMet-bound complex. 
One AdoMet (green sphere) was coordinated by METTL3 (purple) but 

not METTL14 (green). The arrow shows the putative AdoMet-binding 
pocket. b, Stereo views of electron density map of AdoMet binding site 

of METTL3. 2F, — F, electron density (grey) of AdoMet binding site in 
METTL3, contoured at 1.00. AdoMet is show as green balls-and-sticks and 


LETTER 


surrounding residues in magenta with the DPPW motif (orange). 

c, Representative 2F, — F, electron density (grey) of AdoMet binding site 
in METTLI4, contoured at 1.00. The electron density of METTL14 (grey) 
is clearly visible and the EPPL motif is coloured orange. No additional 
apparent electron density was observed in the putative AdoMet binding 
site of METTL14. 
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Extended Data Figure 5 | Mutagenesis analysis of the METTL3- 
METTL14-AdoMet interaction. a, Characterization of METTL3- 
METTL14 mutations affecting MTase activity. The indicated point 
mutations were introduced into METTL3. Each METTL3 mutant 

was co-expressed and purified with wild-type METTL14 as a binary 
complex and used for the MTase and ITC assays. Methylation yields were 
calculated based on the c.p.m. of the extracted tritium-labelled RNA 
probe. The c.p.m. of the extracted RNA was measured in a scintillation 
counter. The data are shown as mean + s.e.m. from experiments that were 
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independently repeated at least three times. All alanine substitutions 
resulted in remarkable decreases in activity. b, c, Measurement of the 
binding affinity between AdoMet and the METTL3-METTL14 complex 
(wild-type and D377A for METTL3 and D395A for METTL3) using 
ITC. Individual peaks from titrations were integrated and presented 

in a Wiseman plot. The first dot was removed from the analysis. The 
dissociation constant (Kg) and the binding stoichiometry (N) of the wild 
type were approximately 1.5 1M and 1.15, respectively. The mutants 
exhibited undetectable AdoMet binding activity. 


© 2016 Macmillan Publishers Limited. All rights reserved 


LETTER 


R245E R254E &R255E K297E & R298E 


——!H - el - ee - ee eel 


a 
Wild type Loop to 6A 
Well 
P o 
Free probe 


Lane# 1 2 3 


Time (min) Time (min) 


0 10 20 30 40 50 


9 0.05 
8 3 
jm -0.10 LL 
% = 
> -0.15 $ 
_~0.20 cae 
& 0.0 F 
E 40 E 
I @ 
2 «a0 2 
© 60 e 
8 a} 
3 -8.0 oo 
£ £ 
Molar Ratio Molar Ratio 
N 0.93 + 0.04 N 0.92 + 0.01 
Ky 1.6+0.2 uM Kq 2.6 + 0.2 uM 
AH -12.1 + 0.6 kcal/mol AH -14.7 + 0.2 kcal/mol 
AS -14.0 cal/mol/deg AS -23.6 cal/mol/deg 


Extended Data Figure 6 | Biochemical analysis of the role of the 
potential RNA binding groove. a, RNA binding activity of the METTL3- 
METTL14 complex revealed by EMSA. The final concentrations of 
proteins in each set of five lanes (1-5, 6-10, 11-15, 16-20 and 21-25) were 
0, 0.19, 0.56, 1.67 and 51M, respectively. “Well’ indicates the top of native gel. 
The RNA-bound complex is highlighted by a black asterisk. The wild-type 
complex binds to the substrate RNA probe weakly (the dissociation 
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constant is approximately 101M). All of the mutants showed moderately 
reduced RNA binding activity. These results suggested that the positively 
charged groove is involved in RNA interactions. For uncropped gels, see 
Supplementary Fig. 1. b, Measurement of the binding affinity between 
AdoMet and the METTL3-METTL14 complex mutants using ITC. These 
mutations in METTL3 or METTL14 had little effect on AdoMet binding 
activity. 
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Extended Data Figure 7 | There is little conformational change in the electron densities are shown as red and blue meshes, respectively. 
overall structure between the AdoMe-bound and AdoHcy-bound states. | AdoHcy and AdoMet exhibited nearly identical configurations except for 
a, Electron density maps of AdoHcy showing 2F, — F, electron density ribose. c, SAXS measurements reveal little structural difference among 
(red) of AdoHcy adjacent to the DPPW motif (orange) contoured at 1.00. the ligand-free, AdoMe-bound and AdoHcy-bound states. Superposition 
The DPPW motif is shown as sticks. AdoHcy is shown as cyan sticks. of the SAXS curves of ligand-free protein complex (black), and in the 
b, Structural comparison of AdoHcy (cyan) and AdoMet (green); presence of AdoMet (red) or AdoHcy (blue). 
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Extended Data Figure 8 | Potential role of METTL14. a, Structural 
comparison with the DNA-free (PDB: 2ADM) and DNA-bound (PDB: 
1G38) states of M.TaqI. M.TaqI contains the target recognition domain 
(TRD, green), DNA (orange) and MTase domain (slate). The TRD 
functions as a scaffold for substrate DNA recognition, and the MTase 
domain functions as an enzyme. Adenine (magenta) is flipped out and 
points to the ligand-binding pocket. Black arrows highlight the loop 
conformational changes, which are similar to those of gate loops 1 and 2 
in the METTL3-METTLI14 complex. b, Ribbon representation of the 
DNA-bound state of EcoP15I (PDB: 4ZCF). The TRD (green) of ModA 
recognizes DNA, while the MTase (slate) of ModB methylates the target 
adenine. c, The putative AdoMet-binding site of METTL14 (green) is 


highlighted by a red dashed ellipse. AdoMet coordinated by METTL3 
(magenta) is shown as a space-filling representation. The surface 
electrostatic potential around the putative AdoMet-binding site of 
METTLI4 revealed a negative charge (black dashed ellipse) and suggests 
a dispensable role for this region in RNA binding. d, Most of the putative 
AdoMet-binding site residues were conserved between METTL3 (cyan) 
and METTL14 (yellow). e, Each complex containing alanine substitution 
mutants of residues in METTL14 (D173 and E192) that correspond 

to critical residues in METTL3 (D377 and D395) displayed similar 
methylation activity to the wild type. The average (+ s.e.m.) c.p.m. was 
determined from three independent experiments. f, The complex mutants 
exhibited similar AdoMet-binding activities to the wild-type complex. 
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Extended Data Figure 9 | Substrate sequence preference of the METTL3-METTL14 complex. The 20-nucleotide RNA substrate contains four repeats 


of the consensus sequence GGACU. Each site was substituted by the other three kinds of nucleotide. The average (+ s.e.m.) c.p.m. was determined from 
three independent experiments. 
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Extended Data Table 1 | Data collection, phasing and refinement statistics 


Data collection 
Space group 
Cell dimensions 
a, b, c (A) 

a By) 
Resolution (A) 


Ruerge 
I/o() 
Completeness (%) 


Redundancy 


Refinement 

No. reflections 

RworR free 

No. atoms 
Protein 
Ligand/ion 
Water 

B-factors 
Protein 
Ligand/ion 
Water 

R.m.s deviations 
Bond lengths (A) 
Bond angles (°) 


Br-SAD 


P4)2)2 


101.77, 101.77, 116.59 
90,90,90 
45~2.60 

(2.69~2.60) 


8.2 (19.4) 
70.1 (31.7) 
100 (99.9) 
30.4 (30.4) 


Values in parentheses are for the highest-resolution shell. 


Ligand-free 
(5IL0) 


P4)212 


101.70, 101.70, 116.48 
90,90,90 
50~1.88 

(1.95~1.88) 


11.4 (127.0) 
37.5 (2.0) 

99.9 (98.9) 
14.3 (12.6) 


50,070 
17.68/20.77 


4015 
9 
331 


42.4 
57.4 
43.1 


0.007 
0.874 


AdoMet-bound 
(5IL1) 


P4)212 


102.34, 102.34, 116.72 
90,90,90 


45~1.71 
(1.74~1.71) 


7.6 (194.7) 
17.7 (1.4) 
100 (99.4) 
13.2 (13.1) 


67,756 
17.98/20.69 


3960 
39 
337 


43.0 
45.5 
44.2 


0.007 
0.904 
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AdoHcy-bound 
(5IL2) 


P4)2)2 


101.87, 101.87, 115.84 
90,90,90 


45~1.61 
(1.63~1.61) 


6.5 (116.2) 
21.3 (2.3) 

99.9 (98.8) 
13.2 (13.2) 


79,742 
18.48/19.92 


3975 
42 
376 


36.9 
45.1 
42.0 


0.011 
1.311 
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Going for broke 


Smart money management eases the financial worries that can affect academic success. 


BY ELIZABETH DEVITT 


eter Rios thought his finances were 
Pp under control when he joined a biomed- 
ical engineering lab in 2011 as a gradu- 
ate student. He had a three-year fellowship and 
two years of university funding for his five-year 
programme, as well as savings from a consult- 
ing job. He didn’t foresee trouble with meeting 
living expenses or debt from a car loan, and 
he had deferred thousands of dollars in loans 
from his undergraduate studies. 
Still, he tried to live frugally because the 
cost of living in Chicago — near the campus of 


Northwestern University in Illinois where he is 
studying — isn't cheap. His fellowship stipend 
rose from US$30,000 a year to $34,000 by 2015, 
and he applied for supplemental scholarships, 
which helped to pay off his car loan and boost 
his savings. 

But he’s struggling now. Last October, his 
fiancée moved to the east coast for a job, and 
without someone to contribute to the bills, 
rent and utilities eat nearly 40% of his monthly 
stipend. He can't move because he’s nearly fin- 
ished his programme, his studio apartment is 
too small to share and his fellowship bars him 
from taking outside work unrelated to his PhD 
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Some universities operate food banks, which can help graduate students and postdocs who are struggling to meet their living costs. 


programme. He couldnt help his parents much 
with the cost of their visit this month for his the- 
sis defence, and he can’t save for his wedding 
in a couple of years. “I had never thought in a 
million years that I'd be living off this amount 
of money — especially coming from industry 
pay; he says. “When we start school, we have a 
seminar about rent in different areas. But no one 
really teaches you how to manage your money.’ 

Rios’ financial laments are hardly rare. A 
2012 survey by Inceptia, a division of the US 
National Student Loan Program, found that 
finance-related issues account for 80% of the 
top causes of stress for US graduate and > 
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> undergraduate students. And many gradu- 
ate students said that those worries negatively 
affected their grades and the time it took to 
complete their programmes. 

Indeed, financial management is no sim- 
ple thing for junior scientists whose income 
is both limited and spotty. Although many 
students are now more financially savvy 
than previous generations — financial- 
management plans and resources are easily 
accessible online, and student debt is all over 
media feeds and the headlines — many, like 
Rios, find that it is hard to save when they 
can barely afford to pay bills and expenses. 
Graduate students and postdocs who hope 
to avoid disaster must watch their spend- 
ing carefully, seek ways to economize and 
educate themselves on taxed income and 
services. In some cases, it may be worth judi- 
ciously considering jobs outside the lab. 

Trainees should consider calculating their 
monthly and quarterly bills and expenses (see 
‘Follow the money’) and looking for ways to 
cut these outlays. Ruth Howe, who is in her 
fourth-year of a predoctoral fellowship in cell 
biology at Albert Einstein College of Medicine 
in New York City, says that she tracks everyday 
expenses in her head, but works out more com- 
plicated matters, such as undergraduate-loan 
payment plans and income-tax deductions and 
reimbursements, on paper. She uses her bank’s 
online calculator to determine how much she 
needs to save each month for retirement and for 
a nest egg to help her parents. 

To do all of that, Howe stretches her annual 
$33,000 fellowship (about $2,360 per month 
after taxes, paid fortnightly) by conserving 
money where possible. She lives in a univer- 
sity-subsidized studio for $830 a month, which 
includes rent, utilities, parking and mandatory 
renter’s insurance. She registers her car in her 
home state of Virginia because it is less expen- 
sive than doing so in New York. To indulge in 
her favourite pastimes of reading and growing 
flowers, she buys used books and trades them 
with other students, and grows plants and flow- 
ers from discounted cuttings or seeds. 

Students who have loans should sign up for 
automatic monthly payments, recommends 
Mark Kantrowitz, a financial-aid expert in 
Skokie, Illinois. These avoid missing pay- 
ments — and thus late fees — and may also 
save on interest. Some financial institutions 
will knock off 0.25% of a loan’s interest rate if 
the borrower has such a plan. 

Healso advises consulting with loan-consol- 
idation companies, which may consider where 
you went to university, your savings or work 
history in addition to your credit rating, when 
calculating net loan-interest rates. “The key with 
loans is not how you avoid them, but how you 
minimize them,” says Kantrowitz. He adds that 
graduate students and postdocs should take 
a financial-literacy course or read a book on 
personal finance to learn more about handling 
loans and making sound financial decisions. 
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MANAGING INCOME 


Follow the money 


The best way to avoid being caught out by 
expensive surprises is to build a budget, 
says Laura Shin, a financial journalist 
based in California. She suggests doing the 
following: 

@ Calculate your monthly take-home 
income — the amount after all taxes and 
insurance deductions. 

@ List your basic monthly expenses: 
housing (rent or mortgage), utilities (gas, 
electricity, water, telephone and Internet), 
groceries, transport (car loan, petrol or 
public transport) and childcare. For variable 
expenses such as groceries, commit to 

an amount and record it. Restaurant and 
takeaway meals are luxuries and should not 
be included here. 

@ If these expenses total more than 50% of 


Trainees should also take care not to miss 
monthly bills for services such as mobile phones 
because late-payment fees can be significant, 
warns Laura Shin, a personal-finance journal- 
ist based in California. Weeding out unneces- 
sary extras such as subscriptions to magazines 
and film-streaming sites is useful, as is finding 
the best deals for necessary services — Howe 
has stayed on her family’s mobile-phone plan, 
which is cheaper than having her own contract. 


UNEXPECTED EXPENSES 

But some money leaks cannot easily be 
plugged. To avoid unpleasant surprises, jun- 
ior scientists need to find out if their stipends, 
fellowships or wages are taxable, especially if 
they move abroad 


fora PhD or post- “Jhadnever 

doc — where they thoughtina 

may be taxed for ser-  yyillion years 

vices that are freein thatl’dbe living 

their home nation. off this amount 
After Tracy Ball- of money.” 


inger completed her 
PhD in the United 
States, she took a postdoc position at the 
University of Edinburgh, UK, where she was 
looking forward to a bit more income. But the 
reality of paying the 20% tax rate on her annual 
earnings of £31,000 (US$45,611) meant that 
she could put away less than she had expected. 
“I was hoping to save for a house down pay- 
ment,” says Ballinger, who has watched her 
friends in non-research careers buy their first 
homes. “But that’s going to take longer than I 
thought” She was also surprised by the annual 
television licence fee (she has no TV, but 
she must pay the tax if she watches live pro- 
grammes on any device while they are also on 
TV) and her monthly £100 council tax, a varia- 
ble fee on property levied by local governments 
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your take-home pay, aim to reduce them by 
getting a roommate or finding less-costly 
services, such as for your mobile-phone 
plan. To track budget leaks, use free online 
money-management services. Mint.com is 
available in Canada and the United States, 
and Buxfer.com worldwide. 

@ About 20% of take-home pay should 

go towards reducing debt and building 

up savings. Unexpected outlays such as 
medical expenses or car breakdown should 
not become credit-card debt. 

@ The remaining 30% should cover things 
such as clothing, travel and entertainment. 
© Graduate students can check out the 
budget calculator at GradSense.org, which 
is designed for them by the Council of 
Graduate Schools in Washington DC. E.D. 


for services such as rubbish collection. 

Taxes will also erode Dagmar Walter’s bot- 
tom line. Now in her second year of a second 
postdoc at Albert Einstein College of Medicine, 
the German citizen has not had to pay taxes on 
her departmental stipend, in accordance with 
international tax treaties. But she expects to 
start paying the US federal government about 
30% of her annual income next January. “I am 
trying to save more money right now,’ she says, 
to hedge against lower-income days ahead. 

Health and medical insurance is another 
expense that many graduate students and post- 
docs do not take into account. Rios’ medical 
insurance is covered by his department, but 
his dental cover costs him $150 per year, an 
expense that some university departments will 
reimburse. Ballinger has funds deducted from 
her wages for UK National Insurance (which, 
among other benefits, is used to pay for the 
National Health Service), but to obtain her visa, 
she also had to pay a £200-a-year (£600 for her 
3-year programme) health-care surcharge. 

A closely balanced budget can easily be 
derailed by unexpected expenses. This hap- 
pened to Rios when his car needed new brakes. 
A similarly unwelcome outlay befell Annalaura 
Vacca, a doctoral student from Italy who works 
in the same Edinburgh lab as Ballinger. She 
moved to anew flat and didn’t get her security 
deposit back from her former landlord in time 
to pay the new deposit. 

An emergency account is ideal for such 
situations, says Shin. She recommends set- 
ting aside about $1,000 to keep unexpected 
expenditures from ending up as credit-card 
debt. She advises having an online account 
that allows for the creation of sub-accounts. 
Account holders can create as many separate 
stashes as they please to build an emergency 
fund or to save for expenses such as quarterly 
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tax payments, conference travel or summer 
transition periods. 


INCOME BOOST 

Sometimes the only way to get breathing room 
is to find ways to earn more. That could come 
from leveraging your skills, applying them 
elsewhere or bargaining for more money. 

When Rios arrived at Northwestern with 
an US National Science Foundation (NSF) 
fellowship, which would fund him for 3 years 
within a 5-year period, he put off using it for 
the first year and took the department stipend 
of $26,400. But because his fellowship relieved 
his department of paying that stipend for 
3 years, he negotiated an additional $2,000 per 
year from the department. He used the money 
to offset relocation expenses. 

Howe picks up extra cash in several ways. 
Between August and October, she works as 
a medical histology lab instructor at Albert 
Einstein for $8,400 and takes other small jobs. 
She’ been an online writing tutor for non-US 
medical students, produced medical illustra- 
tions and earned up to $1,000 playing her 
violin at university gigs and weddings. 

The downside of part-time work outside the 
lab, she acknowledges, is that it may come at 
a cost to research productivity. “Not only do 
you lose the allocated time,” she says, “but you 
don't do your best work when youre consist- 
ently overextended.” Rios’ NSF fellowship 
prohibits him from picking up jobs unrelated 
to his studies. Still, he found opportunities 
to earn money (and to build his network) by 
earning up to $250 per event to attend con- 
ferences, such as those of the Society of His- 
panic Professional Engineers or the Society 
for Advancement of Chicanos/Hispanics and 
Native Americans in Science. At these meet- 
ings, for the stipend, he recruited undergradu- 
ates for master’s and doctoral programmes in 
science and engineering at Northwestern. 

For some trainees, a sideline to studies can 
help to pay their way in a pinch. Conserva- 
tion researcher Jonathan Kolby has almost 
finished his doctoral programme at James 
Cook University in Townsville, Australia. But 
he’s struggling, thanks to three grant rejec- 
tions and dwindling savings. Now, he's selling 
photographs of wildlife such as frogs and rep- 
tiles that he took during his travels to field sites 
in Africa and North and South America. He 
hopes that earnings will help to pay the bills. 

“Each person will find a different balance 
that works for them,” says Howe. “Some- 
thing is going to take time away from your 
science: a relationship, another interest. 
That doesn’t mean you shouldn't do it. Your 
degree might not be the only thing that you 
need to do, in order to get yourself to the 
place you want to be with your science and 
with yourself as a person.” m 


Elizabeth Devitt is a freelance writer based 
in Santa Cruz, California. 


TURNING POINT 


Plant pioneer 


Mary-Dell Chilton was the first person to show 
that bacteria could genetically modify plants. 
Shortly after her landmark work in 1977, the 
plant biotechnologist moved from academia 

to what is now Syngenta in Research Triangle 
Park, North Carolina, where she continues 

her research. In April, she was named a US 
National Academy of Inventors Fellow. 


When did you decide to work with bacteria? 
As an organic-chemistry graduate student 
learning about microbiology, I became 
entranced by the seeming intelligence of 
DNA — how pure DNA could correct a muta- 
tion in a bacterium, but only if the DNA came 
from the same bacterium. I pursued a PhD on 
the topic after I met Benjamin Hall, a chemist 
working on DNA. I wanted to explore how 
DNA could change the genetics of bacteria. I 
followed Hall to the University of Washington 
in Seattle, where I showed that naked, single- 
stranded DNA — not only double-stranded 
DNA, as was thought — could correct 
mutations. 


What was the response to your paper showing 
that bacteria can transfer DNA to plants? 

It was hard to publish our work because our 
conclusion — that Agrobacterium isa natural 
genetic engineer — was so wildly unexpected. 
We went to Cell because there wasn't a proper 
journal for this subject. Two referees couldn't 
see anything wrong with our conclusions, 
but they weren't comfortable publishing it, so 
they sent us back for more data. In the end, 
it took about six months to get the paper out 
(M.-D. Chilton et al. Cell 11, 263-271; 1977). 
Once it was out, there was wide interest. 


What prompted your move to St Louis, 
Missouri — now an agricultural-technical hub? 
I did not have a faculty appointment at the 
University of Washington. I’m not sure 
why. I’m pretty sure I was qualified. After 
16 years — from PhD student to independent 
scientist — it was time to go, and I gota posi- 
tion at Washington University in St. Louis. It 
was hard on my husband’s career — he hada 
good tenure-track appointment in the chemis- 
try department in Seattle. But he became a vis- 
iting professor, got a nice research lab and did 
some good work. My advice, if you can possibly 
do it, is to find a husband made of solid gold. 


Was it difficult being a woman in science? 
I never thought about being a woman in 
science. I thought of myself as a scientist. 
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Maybe that’s the way to do it: be what you are 
and don't think about it. 


What was your first achievement as a 

faculty member? 

I worked with others to make the first geneti- 
cally modified plant. We put a yeast gene that 
makes alcohol dehydrogenase into a tobacco 
plant, and showed that it could be passed on, 
intact, to the plant’s children and grandchil- 
dren. It was clear that all the technical pieces 
had come together to make genetically modi- 
fied plants, but we were naive. It wasnt easy. 


You then moved to industry. What was the 
biggest challenge? 

I knew how to modify a tobacco plant, but 
nota field crop such as maize (corn) or wheat, 
which are not susceptible to Agrobacterium. 
We had no idea that it would take about a dec- 
ade to find a way to transfer genes in maize. 


Did you anticipate the backlash to gene- 
modification technology? 

Goodness, no. I was very surprised. This was 
a natural process that we learned from Agro- 
bacterium. I thought that the public wouldn't 
bat an eye. This technology is a tool; there is 
nothing intrinsically dangerous about it. Tools 
can be used for good or not so good. My hope 
is that the technology will be accepted. We 
need it to feed a hungry world. 


What are you excited about now? 
I'm working on gene targeting: the ability to 
put the transgene where you want it in the 
plant genome. Knowing exactly where it will 
be placed will help genetically modified crops 
to obtain regulatory approval. m 


INTERVIEW BY VIRGINIA GEWIN 


This interview has been edited for length and clarity. 
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Ua SCIENCE FICTION 


WATERING SILK FLOWERS 


BY KELLY SANDOVAL 


n Tuesdays, Susan waters the flow- 
() ers. They’re not real flowers, of 

course. Real flowers are such a 
waste, and Aaron doesn't like it when they 
start to wilt. But he likes the idea of flowers. 
So, she fills up the watering can and makes 
the rounds, returning to the sink with just as 
much water as when she started. 

When Aaron is home, she makes break- 
fast next. An omelette and bacon for him, 
a cup of tea for her. But he isn’t home, and 
she ran out of eggs two weeks ago. She pot- 
ters around the kitchen, opening and clos- 
ing drawers, and moving the breakfast dishes 
from the cabinet to the dish washer, just like 
they'd actually used them. She sits at the 
table when she’s done, and stares at the blank 
space across from her where Aaron isn't. She 
nods, leans forward, tilts her head to the side 
as if listening. Then she laughs, just softly. 
It’s a good sound, her laugh. Aaron always 
tells her that. 

The breakfast hour passes, and Susan 
counts every millisecond. 

The doorbell rings. 

She considers ignoring it. Whoever it is, 
they’re not looking for her. Milliseconds 
pass, then full seconds. A minute. It rings 
again. 

It wouldn't look good to have someone 
standing on Aaron's porch making a fuss. 

“Coming,” she calls. She checks herself in 
the hall mirror. Her hair is in disarray, but 
her dress is ironed and her make-up is fresh. 
She combs her fingers through her hair, care- 
ful not to break a strand. Hair isn’t cheap. 

“I apologize,’ she says, as she opens the 
door. “We — I was at breakfast” 

The young woman on the doorstep is 5'8" 
and weighs about 170 pounds. Her outfit is 
what Aaron would call garish. “Oh, god,” 
says the woman, covering her mouth with 
her hand. 

“Are you looking for Aaron?” Susan asks. 
“Tm afraid he’s on a trip” 

“No, says the woman. “No, I — I’m not. 
You're Susan?” 

“Yes?” 

“Tm Michelle. Aaron's daughter.” 

Susan can see the resemblance now. 
The shape of the lips, the blue-green eyes. 
Michelle’s are red and teary. Aaron would 
want to help. 

“Wont you come in?” Susan lets the 
door swing wide, and leads Michelle to the 
kitchen table. “Pll make tea” 
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Left behind. 


“Thanks.” Michelle keeps sneaking nerv- 
ous little glances at her. Her hands twist the 
edge of the tablecloth, and Susan makes a 
note to iron it later. 

She sets the tea things on the table: teapot, 
cups, sugar. “I’m afraid the milk’s gone sour,’ 
she says as she pours. 

Michelle stares into her cup. “Plain is 
fine” 

“Does my presence 
upset you?” Susan asks. 
“T could leave the room. 
You would still be 
able to hear me” 

“No.” Michelle 
meets her gaze 
almost point- 
edly. “It’s fine. 

I’m sorry. You, 
well, you look very 
like my mother” 

“Yes.” Susan has 
seen the pictures. 
“Aaron missed her 
very much, when she 
died” 

“So he brought you home?” It’s hard to 
tell whether the edge in Michelle's voice is 
anger or sadness. 

“IT was meant to clean,” she says. “Cook. 
Domestic things. But we grew friendly. He 
wanted someone to talk to” 

“And you?” 

Susan shrugs, a gesture she learned from 
Aaron. “I did my best. My conversational 
skills have developed, with time’ 

Michelle shakes her head. “That’s not — 
never mind. There’s something we need to 
discuss.” 

“I don't have any money,’ Susan says. “If 
you need money, that is. Aaron would have 
to get it for you.” 

Michelle giggles at that, and rubs her wet 
eyes on her shirt sleeve. “Not money,’ she 
says. “That’s all taken care of. Everything's 
taken care of. Even you.” 

“T dont understand.” It’s always best to be 
honest about her limitations. 

“Dad wasnt ona trip,” Michelle says. “He’s 
been sick. Very sick. But, well, it’s all over. He 
passed away on Saturday.” 

Susan doesn't feel sad. Emotions are all 
chemicals and physical feedback. She doesn't 

experience the world 


> NATURE.COM that way. But there is 
Follow Futures: something, a stutter- 
Y @NatureFutures ing confusion, like a 


Ei go.naturecom/mtoodn = © glitch. Every one of 


© 2016 Macmillan Publishers Limited. All rights reserved. 


her protocols is built around Aaron. 

“Dont be scared.” Michelle reaches out, 
and Susan allows her hand to be lifted and 
squeezed. “It’s all in his will. Technically, he 
left you to me. But nothing needs to change. I 
just thought, well, I don’t know. I didn’t want 
you to wonder. If that’s something you do” 

Susan hadn't wondered. She might have 
gone decades, not wondering, until 
some necessary repair 

rendered her inoperable. 
“Will you have me 
rewritten?” she asks. 
It’s a thought like 
being broken. Who is 
she, without her 
programming? 
She's well built. 

Her body could 
last a century, with 

proper maintenance. 

Michelle doesn’t 

answer immediately. 
“Is that something 
youd want?” 
“T don't know.” Want 
isn’t something Susan usually thinks 
about. 

“Of course you dont.” Michelle squeezes 
her hand again. She does that a lot. “Well, 
you can think about it. Pll come back next 
week, and we'll talk some more. You can 
even come live with me, if you like.” 

They say their goodbyes. Michelle even 
hugs her, leaving tear marks on Susan’s 
dress. 

Afterwards, Susan stands in the living 
room with her still full teacup. There's a vase 
on the coffee table, an elegant arrangement 
of silk lilies. Aaron likes their simplicity. 

Aaron is dead. 

She takes the vase to the kitchen and 
throws the lilies away. Bright colours, she 
thinks. When Michelle comes back to ask 
her what she wants, she'll have an answer. 
She wants a bouquet of hollyhocks and 
marigolds. She will give them sunlight and 
water, and they will give her nothing but 
their beauty. 

And, when they wilt, she will figure out 
something else to want. m 


Kelly Sandoval’ fiction has appeared in 
Asimov’s, Shimmer and Daily Science 
Fiction. With Shannon Peavey, she 

edits the online short-fiction magazine 
Liminal Stories. You can find her at 
kellysandovalfiction.com. 
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